Information Lossless Multi-modal Image Generation for RGB-T Tracking

Fan Li, Yufei Zha, Lichao Zhang, Peng Zhang, Lang Chen

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

3 Scopus citations

Abstract

Visible-Thermal infrared(RGB-T) multimodal target representation is a key issue affecting RGB-T tracking performance. It is difficult to train a RGB-T fusion tracker in an end-to-end way, due to the lack of annotated RGB-T image pairs as training data. To relieve above problems, we propose an information lossless RGB-T image pair generation method. We generate the TIR data from the massive RGB labeling data, and these aligned RGB-T data pair with labels are used for RGB-T fusion target tracking. Different from the traditional image modal conversion model, this paper uses a reversible neural network to realize the conversion of RGB modal to TIR modal images. The advantage of this method is that it can generate information lossless TIR modal data. Specifically, we design reversible modules and reversible operations for the RGB-T modal conversion task by exploiting the properties of reversible network structure. Then, it does not lose information and train on a large amount of aligned RGB-T data. Finally, the trained model is added to the RGB-T fusion tracking framework to generate paired RGB-T images end-to-end. We conduct adequate experiments on the VOT-RGBT2020 [14] and RGBT234 [16] datasets, the experimental results show that our method can obtain better RGB-T fusion features to represent the target. The performance on the VOT-RGBT2020 [14] and RGBT234 [16] datasets is 4.6% and 4.9% better than the baseline in EAO and Precision rate, respectively.

Original languageEnglish
Title of host publicationPattern Recognition and Computer Vision - 5th Chinese Conference, PRCV 2022, Proceedings
EditorsShiqi Yu, Jianguo Zhang, Zhaoxiang Zhang, Tieniu Tan, Pong C. Yuen, Yike Guo, Junwei Han, Jianhuang Lai
PublisherSpringer Science and Business Media Deutschland GmbH
Pages671-683
Number of pages13
ISBN (Print)9783031189159
DOIs
StatePublished - 2022
Event5th Chinese Conference on Pattern Recognition and Computer Vision, PRCV 2022 - Shenzhen, China
Duration: 4 Nov 20227 Nov 2022

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume13537 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference5th Chinese Conference on Pattern Recognition and Computer Vision, PRCV 2022
Country/TerritoryChina
CityShenzhen
Period4/11/227/11/22

Keywords

  • Data generation
  • Reversible network
  • RGB-T tracking

Fingerprint

Dive into the research topics of 'Information Lossless Multi-modal Image Generation for RGB-T Tracking'. Together they form a unique fingerprint.

Cite this