TY - JOUR
T1 - Generative Transformer for Accurate and Reliable Salient Object Detection
AU - Mao, Yuxin
AU - Zhang, Jing
AU - Wan, Zhexiong
AU - Tian, Xinyu
AU - Li, Aixuan
AU - Lv, Yunqiu
AU - Dai, Yuchao
N1 - Publisher Copyright:
© 1991-2012 IEEE.
PY - 2025
Y1 - 2025
N2 - We explore the impact of transformers on accurate and reliable salient object detection. For accuracy, we integrate the transformer with a deterministic model and delineate its advantages in structural modeling. Regarding reliability, we address the transformer's tendency to produce overly confident, incorrect predictions. To gauge reliability implicitly, we introduce a latent variable model within the transformer framework, termed the inferential generative adversarial network (iGAN). The stochastic nature of the latent variable facilitates the estimation of predictive uncertainty, which serves as an auxiliary measure of the model's prediction reliability. Different from the conventional GAN, which defines the distribution of the latent variable as fixed standard normal distribution N0,I. The proposed iGAN infers the latent variable by gradient-based Markov Chain Monte Carlo (MCMC), namely Langevin dynamics, leading to an input-dependent latent variable model. We apply our proposed iGAN to fully supervised salient object detection, explaining that iGAN within the transformer framework leads to both accurate and reliable salient object detection. The source code and experimental results are publicly available via our project page: https://npucvr.github.io/TransformerSOD.
AB - We explore the impact of transformers on accurate and reliable salient object detection. For accuracy, we integrate the transformer with a deterministic model and delineate its advantages in structural modeling. Regarding reliability, we address the transformer's tendency to produce overly confident, incorrect predictions. To gauge reliability implicitly, we introduce a latent variable model within the transformer framework, termed the inferential generative adversarial network (iGAN). The stochastic nature of the latent variable facilitates the estimation of predictive uncertainty, which serves as an auxiliary measure of the model's prediction reliability. Different from the conventional GAN, which defines the distribution of the latent variable as fixed standard normal distribution N0,I. The proposed iGAN infers the latent variable by gradient-based Markov Chain Monte Carlo (MCMC), namely Langevin dynamics, leading to an input-dependent latent variable model. We apply our proposed iGAN to fully supervised salient object detection, explaining that iGAN within the transformer framework leads to both accurate and reliable salient object detection. The source code and experimental results are publicly available via our project page: https://npucvr.github.io/TransformerSOD.
KW - inferential generative adversarial network
KW - salient object detection
KW - Vision transformer
UR - http://www.scopus.com/inward/record.url?scp=85205319292&partnerID=8YFLogxK
U2 - 10.1109/TCSVT.2024.3469286
DO - 10.1109/TCSVT.2024.3469286
M3 - 文章
AN - SCOPUS:85205319292
SN - 1051-8215
VL - 35
SP - 1041
EP - 1054
JO - IEEE Transactions on Circuits and Systems for Video Technology
JF - IEEE Transactions on Circuits and Systems for Video Technology
IS - 2
ER -