H3T: Hierarchical Transferable Transformer with TokenMix for Unsupervised Domain Adaptation

Yihua Ren; Junyu Gao; Yuan Yuan

doi:10.1016/j.eswa.2024.125543

H3T: Hierarchical Transferable Transformer with TokenMix for Unsupervised Domain Adaptation

Yihua Ren, Junyu Gao, Yuan Yuan

光电与智能研究院

科研成果: 期刊稿件 › 文章 › 同行评审

摘要

Recent research has been focused on exploring the capabilities of Vision Transformers (ViTs) in Unsupervised Domain Adaptation (UDA). This approach typically involves providing more significant attention to fine-grained common information through patch-level transferable discrimination. However, prematurely assigning narrow-range transferability information at the encoding stage can sparse image information, thereby increasing the difficulty of downstream tasks. Therefore, we propose a Hierarchical Transferable Transformer with TokenMix (H3T), which maintains the allocation of fine-grained transferability at the encoding stage while enhancing the learning strength of image information through feature mixup. To address the challenge of missing sample labels in the target domain within the domain adaptation task, we have specifically designed the TokenMix Module (TMM) for ViTs. This module learns the style information from both domains while alleviating the impact of image sparsity on downstream tasks. Furthermore, to enhance the semantic connections among narrow-range image transfer messages, we propose the Hierarchical Discriminative Module (HDM), which also serves a critical role in encoding discriminative information. Our approach underwent comprehensive experimentation across five datasets of varying sizes, demonstrating its effectiveness. Our code is available at https://github.com/reyihua/H3T.

源语言	英语
文章编号	125543
期刊	Expert Systems with Applications
卷	262
DOI	https://doi.org/10.1016/j.eswa.2024.125543
出版状态	已出版 - 1 3月 2025

访问文件

10.1016/j.eswa.2024.125543

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{15c284b32eba4fbfae2b7e6e54897d5e,

title = "H3T: Hierarchical Transferable Transformer with TokenMix for Unsupervised Domain Adaptation",

abstract = "Recent research has been focused on exploring the capabilities of Vision Transformers (ViTs) in Unsupervised Domain Adaptation (UDA). This approach typically involves providing more significant attention to fine-grained common information through patch-level transferable discrimination. However, prematurely assigning narrow-range transferability information at the encoding stage can sparse image information, thereby increasing the difficulty of downstream tasks. Therefore, we propose a Hierarchical Transferable Transformer with TokenMix (H3T), which maintains the allocation of fine-grained transferability at the encoding stage while enhancing the learning strength of image information through feature mixup. To address the challenge of missing sample labels in the target domain within the domain adaptation task, we have specifically designed the TokenMix Module (TMM) for ViTs. This module learns the style information from both domains while alleviating the impact of image sparsity on downstream tasks. Furthermore, to enhance the semantic connections among narrow-range image transfer messages, we propose the Hierarchical Discriminative Module (HDM), which also serves a critical role in encoding discriminative information. Our approach underwent comprehensive experimentation across five datasets of varying sizes, demonstrating its effectiveness. Our code is available at https://github.com/reyihua/H3T.",

keywords = "Adversarial learning, Domain adaptation, Mix-up, Vision transformers",

author = "Yihua Ren and Junyu Gao and Yuan Yuan",

note = "Publisher Copyright: {\textcopyright} 2024",

year = "2025",

month = mar,

day = "1",

doi = "10.1016/j.eswa.2024.125543",

language = "英语",

volume = "262",

journal = "Expert Systems with Applications",

issn = "0957-4174",

publisher = "Elsevier Ltd",

}

TY - JOUR

T1 - H3T

T2 - Hierarchical Transferable Transformer with TokenMix for Unsupervised Domain Adaptation

AU - Ren, Yihua

AU - Gao, Junyu

AU - Yuan, Yuan

PY - 2025/3/1

Y1 - 2025/3/1

N2 - Recent research has been focused on exploring the capabilities of Vision Transformers (ViTs) in Unsupervised Domain Adaptation (UDA). This approach typically involves providing more significant attention to fine-grained common information through patch-level transferable discrimination. However, prematurely assigning narrow-range transferability information at the encoding stage can sparse image information, thereby increasing the difficulty of downstream tasks. Therefore, we propose a Hierarchical Transferable Transformer with TokenMix (H3T), which maintains the allocation of fine-grained transferability at the encoding stage while enhancing the learning strength of image information through feature mixup. To address the challenge of missing sample labels in the target domain within the domain adaptation task, we have specifically designed the TokenMix Module (TMM) for ViTs. This module learns the style information from both domains while alleviating the impact of image sparsity on downstream tasks. Furthermore, to enhance the semantic connections among narrow-range image transfer messages, we propose the Hierarchical Discriminative Module (HDM), which also serves a critical role in encoding discriminative information. Our approach underwent comprehensive experimentation across five datasets of varying sizes, demonstrating its effectiveness. Our code is available at https://github.com/reyihua/H3T.

AB - Recent research has been focused on exploring the capabilities of Vision Transformers (ViTs) in Unsupervised Domain Adaptation (UDA). This approach typically involves providing more significant attention to fine-grained common information through patch-level transferable discrimination. However, prematurely assigning narrow-range transferability information at the encoding stage can sparse image information, thereby increasing the difficulty of downstream tasks. Therefore, we propose a Hierarchical Transferable Transformer with TokenMix (H3T), which maintains the allocation of fine-grained transferability at the encoding stage while enhancing the learning strength of image information through feature mixup. To address the challenge of missing sample labels in the target domain within the domain adaptation task, we have specifically designed the TokenMix Module (TMM) for ViTs. This module learns the style information from both domains while alleviating the impact of image sparsity on downstream tasks. Furthermore, to enhance the semantic connections among narrow-range image transfer messages, we propose the Hierarchical Discriminative Module (HDM), which also serves a critical role in encoding discriminative information. Our approach underwent comprehensive experimentation across five datasets of varying sizes, demonstrating its effectiveness. Our code is available at https://github.com/reyihua/H3T.

KW - Adversarial learning

KW - Domain adaptation

KW - Mix-up

KW - Vision transformers

UR - http://www.scopus.com/inward/record.url?scp=85207934600&partnerID=8YFLogxK

U2 - 10.1016/j.eswa.2024.125543

DO - 10.1016/j.eswa.2024.125543

M3 - 文章

AN - SCOPUS:85207934600

SN - 0957-4174

VL - 262

JO - Expert Systems with Applications

JF - Expert Systems with Applications

M1 - 125543

ER -

H3T: Hierarchical Transferable Transformer with TokenMix for Unsupervised Domain Adaptation

摘要

访问文件

其它文件与链接

指纹

引用此