Information Lossless Multi-modal Image Generation for RGB-T Tracking

Fan Li; Yufei Zha; Lichao Zhang; Peng Zhang; Lang Chen

doi:10.1007/978-3-031-18916-6_53

Information Lossless Multi-modal Image Generation for RGB-T Tracking

Fan Li, Yufei Zha, Lichao Zhang, Peng Zhang, Lang Chen

School of Computer Science

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

3 Scopus citations

Abstract

Visible-Thermal infrared(RGB-T) multimodal target representation is a key issue affecting RGB-T tracking performance. It is difficult to train a RGB-T fusion tracker in an end-to-end way, due to the lack of annotated RGB-T image pairs as training data. To relieve above problems, we propose an information lossless RGB-T image pair generation method. We generate the TIR data from the massive RGB labeling data, and these aligned RGB-T data pair with labels are used for RGB-T fusion target tracking. Different from the traditional image modal conversion model, this paper uses a reversible neural network to realize the conversion of RGB modal to TIR modal images. The advantage of this method is that it can generate information lossless TIR modal data. Specifically, we design reversible modules and reversible operations for the RGB-T modal conversion task by exploiting the properties of reversible network structure. Then, it does not lose information and train on a large amount of aligned RGB-T data. Finally, the trained model is added to the RGB-T fusion tracking framework to generate paired RGB-T images end-to-end. We conduct adequate experiments on the VOT-RGBT2020 [14] and RGBT234 [16] datasets, the experimental results show that our method can obtain better RGB-T fusion features to represent the target. The performance on the VOT-RGBT2020 [14] and RGBT234 [16] datasets is 4.6% and 4.9% better than the baseline in EAO and Precision rate, respectively.

Original language	English
Title of host publication	Pattern Recognition and Computer Vision - 5th Chinese Conference, PRCV 2022, Proceedings
Editors	Shiqi Yu, Jianguo Zhang, Zhaoxiang Zhang, Tieniu Tan, Pong C. Yuen, Yike Guo, Junwei Han, Jianhuang Lai
Publisher	Springer Science and Business Media Deutschland GmbH
Pages	671-683
Number of pages	13
ISBN (Print)	9783031189159
DOIs	https://doi.org/10.1007/978-3-031-18916-6_53
State	Published - 2022
Event	5th Chinese Conference on Pattern Recognition and Computer Vision, PRCV 2022 - Shenzhen, China Duration: 4 Nov 2022 → 7 Nov 2022

Publication series

Name	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume	13537 LNCS
ISSN (Print)	0302-9743
ISSN (Electronic)	1611-3349

Conference

Conference	5th Chinese Conference on Pattern Recognition and Computer Vision, PRCV 2022
Country/Territory	China
City	Shenzhen
Period	4/11/22 → 7/11/22

Keywords

Data generation
Reversible network
RGB-T tracking

Access to Document

10.1007/978-3-031-18916-6_53

Cite this

Li, F., Zha, Y., Zhang, L., Zhang, P., & Chen, L. (2022). Information Lossless Multi-modal Image Generation for RGB-T Tracking. In S. Yu, J. Zhang, Z. Zhang, T. Tan, P. C. Yuen, Y. Guo, J. Han, & J. Lai (Eds.), Pattern Recognition and Computer Vision - 5th Chinese Conference, PRCV 2022, Proceedings (pp. 671-683). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 13537 LNCS). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-031-18916-6_53

Li, Fan ; Zha, Yufei ; Zhang, Lichao et al. / Information Lossless Multi-modal Image Generation for RGB-T Tracking. Pattern Recognition and Computer Vision - 5th Chinese Conference, PRCV 2022, Proceedings. editor / Shiqi Yu ; Jianguo Zhang ; Zhaoxiang Zhang ; Tieniu Tan ; Pong C. Yuen ; Yike Guo ; Junwei Han ; Jianhuang Lai. Springer Science and Business Media Deutschland GmbH, 2022. pp. 671-683 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).

@inproceedings{01b3197e254a4d519f59c832aee56dcb,

title = "Information Lossless Multi-modal Image Generation for RGB-T Tracking",

abstract = "Visible-Thermal infrared(RGB-T) multimodal target representation is a key issue affecting RGB-T tracking performance. It is difficult to train a RGB-T fusion tracker in an end-to-end way, due to the lack of annotated RGB-T image pairs as training data. To relieve above problems, we propose an information lossless RGB-T image pair generation method. We generate the TIR data from the massive RGB labeling data, and these aligned RGB-T data pair with labels are used for RGB-T fusion target tracking. Different from the traditional image modal conversion model, this paper uses a reversible neural network to realize the conversion of RGB modal to TIR modal images. The advantage of this method is that it can generate information lossless TIR modal data. Specifically, we design reversible modules and reversible operations for the RGB-T modal conversion task by exploiting the properties of reversible network structure. Then, it does not lose information and train on a large amount of aligned RGB-T data. Finally, the trained model is added to the RGB-T fusion tracking framework to generate paired RGB-T images end-to-end. We conduct adequate experiments on the VOT-RGBT2020 [14] and RGBT234 [16] datasets, the experimental results show that our method can obtain better RGB-T fusion features to represent the target. The performance on the VOT-RGBT2020 [14] and RGBT234 [16] datasets is 4.6% and 4.9% better than the baseline in EAO and Precision rate, respectively.",

keywords = "Data generation, Reversible network, RGB-T tracking",

author = "Fan Li and Yufei Zha and Lichao Zhang and Peng Zhang and Lang Chen",

note = "Publisher Copyright: {\textcopyright} The Author(s), under exclusive license to Springer Nature Switzerland AG 2022.; 5th Chinese Conference on Pattern Recognition and Computer Vision, PRCV 2022 ; Conference date: 04-11-2022 Through 07-11-2022",

year = "2022",

doi = "10.1007/978-3-031-18916-6_53",

language = "英语",

isbn = "9783031189159",

series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

publisher = "Springer Science and Business Media Deutschland GmbH",

pages = "671--683",

editor = "Shiqi Yu and Jianguo Zhang and Zhaoxiang Zhang and Tieniu Tan and Yuen, {Pong C.} and Yike Guo and Junwei Han and Jianhuang Lai",

booktitle = "Pattern Recognition and Computer Vision - 5th Chinese Conference, PRCV 2022, Proceedings",

}

Li, F, Zha, Y, Zhang, L, Zhang, P & Chen, L 2022, Information Lossless Multi-modal Image Generation for RGB-T Tracking. in S Yu, J Zhang, Z Zhang, T Tan, PC Yuen, Y Guo, J Han & J Lai (eds), Pattern Recognition and Computer Vision - 5th Chinese Conference, PRCV 2022, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 13537 LNCS, Springer Science and Business Media Deutschland GmbH, pp. 671-683, 5th Chinese Conference on Pattern Recognition and Computer Vision, PRCV 2022, Shenzhen, China, 4/11/22. https://doi.org/10.1007/978-3-031-18916-6_53

Information Lossless Multi-modal Image Generation for RGB-T Tracking. / Li, Fan; Zha, Yufei; Zhang, Lichao et al.
Pattern Recognition and Computer Vision - 5th Chinese Conference, PRCV 2022, Proceedings. ed. / Shiqi Yu; Jianguo Zhang; Zhaoxiang Zhang; Tieniu Tan; Pong C. Yuen; Yike Guo; Junwei Han; Jianhuang Lai. Springer Science and Business Media Deutschland GmbH, 2022. p. 671-683 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 13537 LNCS).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - Information Lossless Multi-modal Image Generation for RGB-T Tracking

AU - Li, Fan

AU - Zha, Yufei

AU - Zhang, Lichao

AU - Zhang, Peng

AU - Chen, Lang

N1 - Publisher Copyright: © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022.

PY - 2022

Y1 - 2022

N2 - Visible-Thermal infrared(RGB-T) multimodal target representation is a key issue affecting RGB-T tracking performance. It is difficult to train a RGB-T fusion tracker in an end-to-end way, due to the lack of annotated RGB-T image pairs as training data. To relieve above problems, we propose an information lossless RGB-T image pair generation method. We generate the TIR data from the massive RGB labeling data, and these aligned RGB-T data pair with labels are used for RGB-T fusion target tracking. Different from the traditional image modal conversion model, this paper uses a reversible neural network to realize the conversion of RGB modal to TIR modal images. The advantage of this method is that it can generate information lossless TIR modal data. Specifically, we design reversible modules and reversible operations for the RGB-T modal conversion task by exploiting the properties of reversible network structure. Then, it does not lose information and train on a large amount of aligned RGB-T data. Finally, the trained model is added to the RGB-T fusion tracking framework to generate paired RGB-T images end-to-end. We conduct adequate experiments on the VOT-RGBT2020 [14] and RGBT234 [16] datasets, the experimental results show that our method can obtain better RGB-T fusion features to represent the target. The performance on the VOT-RGBT2020 [14] and RGBT234 [16] datasets is 4.6% and 4.9% better than the baseline in EAO and Precision rate, respectively.

AB - Visible-Thermal infrared(RGB-T) multimodal target representation is a key issue affecting RGB-T tracking performance. It is difficult to train a RGB-T fusion tracker in an end-to-end way, due to the lack of annotated RGB-T image pairs as training data. To relieve above problems, we propose an information lossless RGB-T image pair generation method. We generate the TIR data from the massive RGB labeling data, and these aligned RGB-T data pair with labels are used for RGB-T fusion target tracking. Different from the traditional image modal conversion model, this paper uses a reversible neural network to realize the conversion of RGB modal to TIR modal images. The advantage of this method is that it can generate information lossless TIR modal data. Specifically, we design reversible modules and reversible operations for the RGB-T modal conversion task by exploiting the properties of reversible network structure. Then, it does not lose information and train on a large amount of aligned RGB-T data. Finally, the trained model is added to the RGB-T fusion tracking framework to generate paired RGB-T images end-to-end. We conduct adequate experiments on the VOT-RGBT2020 [14] and RGBT234 [16] datasets, the experimental results show that our method can obtain better RGB-T fusion features to represent the target. The performance on the VOT-RGBT2020 [14] and RGBT234 [16] datasets is 4.6% and 4.9% better than the baseline in EAO and Precision rate, respectively.

KW - Data generation

KW - Reversible network

KW - RGB-T tracking

UR - http://www.scopus.com/inward/record.url?scp=85142838412&partnerID=8YFLogxK

U2 - 10.1007/978-3-031-18916-6_53

DO - 10.1007/978-3-031-18916-6_53

M3 - 会议稿件

AN - SCOPUS:85142838412

SN - 9783031189159

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 671

EP - 683

BT - Pattern Recognition and Computer Vision - 5th Chinese Conference, PRCV 2022, Proceedings

A2 - Yu, Shiqi

A2 - Zhang, Jianguo

A2 - Zhang, Zhaoxiang

A2 - Tan, Tieniu

A2 - Yuen, Pong C.

A2 - Guo, Yike

A2 - Han, Junwei

A2 - Lai, Jianhuang

PB - Springer Science and Business Media Deutschland GmbH

T2 - 5th Chinese Conference on Pattern Recognition and Computer Vision, PRCV 2022

Y2 - 4 November 2022 through 7 November 2022

ER -

Li F, Zha Y, Zhang L, Zhang P, Chen L. Information Lossless Multi-modal Image Generation for RGB-T Tracking. In Yu S, Zhang J, Zhang Z, Tan T, Yuen PC, Guo Y, Han J, Lai J, editors, Pattern Recognition and Computer Vision - 5th Chinese Conference, PRCV 2022, Proceedings. Springer Science and Business Media Deutschland GmbH. 2022. p. 671-683. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). doi: 10.1007/978-3-031-18916-6_53

Information Lossless Multi-modal Image Generation for RGB-T Tracking

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Fingerprint

Cite this