Spatio-Temporal Interactive Learning for Efficient Image Reconstruction of Spiking Cameras

Bin Fan, Jiaoyang Yin, Yuchao Dai, Chao Xu, Tiejun Huang, Boxin Shi

Research output: Contribution to journalConference articlepeer-review

Abstract

The spiking camera is an emerging neuromorphic vision sensor that records high-speed motion scenes by asynchronously firing continuous binary spike streams. Prevailing image reconstruction methods, generating intermediate frames from these spike streams, often rely on complex step-by-step network architectures that overlook the intrinsic collaboration of spatio-temporal complementary information. In this paper, we propose an efficient spatio-temporal interactive reconstruction network to jointly perform inter-frame feature alignment and intra-frame feature filtering in a coarse-to-fine manner. Specifically, it starts by extracting hierarchical features from a concise hybrid spike representation, then refines the motion fields and target frames scale-by-scale, ultimately obtaining a full-resolution output. Meanwhile, we introduce a symmetric interactive attention block and a multi-motion field estimation block to further enhance the interaction capability of the overall network. Experiments on synthetic and real-captured data show that our approach exhibits excellent performance while maintaining low model complexity. The code is available at https://github.com/GitCVfb/STIR.

Original languageEnglish
JournalAdvances in Neural Information Processing Systems
Volume37
StatePublished - 2024
Event38th Conference on Neural Information Processing Systems, NeurIPS 2024 - Vancouver, Canada
Duration: 9 Dec 202415 Dec 2024

Fingerprint

Dive into the research topics of 'Spatio-Temporal Interactive Learning for Efficient Image Reconstruction of Spiking Cameras'. Together they form a unique fingerprint.

Cite this