Spatio-Temporal Interactive Learning for Efficient Image Reconstruction of Spiking Cameras

Bin Fan; Jiaoyang Yin; Yuchao Dai; Chao Xu; Tiejun Huang; Boxin Shi

Spatio-Temporal Interactive Learning for Efficient Image Reconstruction of Spiking Cameras

Bin Fan, Jiaoyang Yin, Yuchao Dai, Chao Xu, Tiejun Huang, Boxin Shi

School of Electronics and Information

Peking University

Research output: Contribution to journal › Conference article › peer-review

Abstract

The spiking camera is an emerging neuromorphic vision sensor that records high-speed motion scenes by asynchronously firing continuous binary spike streams. Prevailing image reconstruction methods, generating intermediate frames from these spike streams, often rely on complex step-by-step network architectures that overlook the intrinsic collaboration of spatio-temporal complementary information. In this paper, we propose an efficient spatio-temporal interactive reconstruction network to jointly perform inter-frame feature alignment and intra-frame feature filtering in a coarse-to-fine manner. Specifically, it starts by extracting hierarchical features from a concise hybrid spike representation, then refines the motion fields and target frames scale-by-scale, ultimately obtaining a full-resolution output. Meanwhile, we introduce a symmetric interactive attention block and a multi-motion field estimation block to further enhance the interaction capability of the overall network. Experiments on synthetic and real-captured data show that our approach exhibits excellent performance while maintaining low model complexity. The code is available at https://github.com/GitCVfb/STIR.

Original language	English
Journal	Advances in Neural Information Processing Systems
Volume	37
State	Published - 2024
Event	38th Conference on Neural Information Processing Systems, NeurIPS 2024 - Vancouver, Canada Duration: 9 Dec 2024 → 15 Dec 2024

Cite this

@article{b3e43a6b37e640939bc676746f8a62c8,

title = "Spatio-Temporal Interactive Learning for Efficient Image Reconstruction of Spiking Cameras",

abstract = "The spiking camera is an emerging neuromorphic vision sensor that records high-speed motion scenes by asynchronously firing continuous binary spike streams. Prevailing image reconstruction methods, generating intermediate frames from these spike streams, often rely on complex step-by-step network architectures that overlook the intrinsic collaboration of spatio-temporal complementary information. In this paper, we propose an efficient spatio-temporal interactive reconstruction network to jointly perform inter-frame feature alignment and intra-frame feature filtering in a coarse-to-fine manner. Specifically, it starts by extracting hierarchical features from a concise hybrid spike representation, then refines the motion fields and target frames scale-by-scale, ultimately obtaining a full-resolution output. Meanwhile, we introduce a symmetric interactive attention block and a multi-motion field estimation block to further enhance the interaction capability of the overall network. Experiments on synthetic and real-captured data show that our approach exhibits excellent performance while maintaining low model complexity. The code is available at https://github.com/GitCVfb/STIR.",

author = "Bin Fan and Jiaoyang Yin and Yuchao Dai and Chao Xu and Tiejun Huang and Boxin Shi",

note = "Publisher Copyright: {\textcopyright} 2024 Neural information processing systems foundation. All rights reserved.; 38th Conference on Neural Information Processing Systems, NeurIPS 2024 ; Conference date: 09-12-2024 Through 15-12-2024",

year = "2024",

language = "英语",

volume = "37",

journal = "Advances in Neural Information Processing Systems",

issn = "1049-5258",

publisher = "Neural information processing systems foundation",

}

TY - JOUR

T1 - Spatio-Temporal Interactive Learning for Efficient Image Reconstruction of Spiking Cameras

AU - Fan, Bin

AU - Yin, Jiaoyang

AU - Dai, Yuchao

AU - Xu, Chao

AU - Huang, Tiejun

AU - Shi, Boxin

PY - 2024

Y1 - 2024

N2 - The spiking camera is an emerging neuromorphic vision sensor that records high-speed motion scenes by asynchronously firing continuous binary spike streams. Prevailing image reconstruction methods, generating intermediate frames from these spike streams, often rely on complex step-by-step network architectures that overlook the intrinsic collaboration of spatio-temporal complementary information. In this paper, we propose an efficient spatio-temporal interactive reconstruction network to jointly perform inter-frame feature alignment and intra-frame feature filtering in a coarse-to-fine manner. Specifically, it starts by extracting hierarchical features from a concise hybrid spike representation, then refines the motion fields and target frames scale-by-scale, ultimately obtaining a full-resolution output. Meanwhile, we introduce a symmetric interactive attention block and a multi-motion field estimation block to further enhance the interaction capability of the overall network. Experiments on synthetic and real-captured data show that our approach exhibits excellent performance while maintaining low model complexity. The code is available at https://github.com/GitCVfb/STIR.

AB - The spiking camera is an emerging neuromorphic vision sensor that records high-speed motion scenes by asynchronously firing continuous binary spike streams. Prevailing image reconstruction methods, generating intermediate frames from these spike streams, often rely on complex step-by-step network architectures that overlook the intrinsic collaboration of spatio-temporal complementary information. In this paper, we propose an efficient spatio-temporal interactive reconstruction network to jointly perform inter-frame feature alignment and intra-frame feature filtering in a coarse-to-fine manner. Specifically, it starts by extracting hierarchical features from a concise hybrid spike representation, then refines the motion fields and target frames scale-by-scale, ultimately obtaining a full-resolution output. Meanwhile, we introduce a symmetric interactive attention block and a multi-motion field estimation block to further enhance the interaction capability of the overall network. Experiments on synthetic and real-captured data show that our approach exhibits excellent performance while maintaining low model complexity. The code is available at https://github.com/GitCVfb/STIR.

UR - http://www.scopus.com/inward/record.url?scp=105000489676&partnerID=8YFLogxK

M3 - 会议文章

AN - SCOPUS:105000489676

SN - 1049-5258

VL - 37

JO - Advances in Neural Information Processing Systems

JF - Advances in Neural Information Processing Systems

T2 - 38th Conference on Neural Information Processing Systems, NeurIPS 2024

Y2 - 9 December 2024 through 15 December 2024

ER -

Spatio-Temporal Interactive Learning for Efficient Image Reconstruction of Spiking Cameras

Abstract

Other files and links

Fingerprint

Cite this