Video Frame Prediction from a Single Image and Events

Juanjuan Zhu; Zhexiong Wan; Yuchao Dai

doi:10.1609/aaai.v38i7.28609

Video Frame Prediction from a Single Image and Events

Juanjuan Zhu, Zhexiong Wan, Yuchao Dai

电子信息学院

Northwestern Polytechnical University Xian

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

1 引用（Scopus）

摘要

Recently, the task of Video Frame Prediction (VFP), which predicts future video frames from previous ones through extrapolation, has made remarkable progress.However, the performance of existing VFP methods is still far from satisfactory due to the fixed framerate video used: 1) they have difficulties in handling complex dynamic scenes; 2) they cannot predict future frames with flexible prediction time intervals.The event cameras can record the intensity changes asynchronously with a very high temporal resolution, which provides rich dynamic information about the observed scenes.In this paper, we propose to predict video frames from a single image and the following events, which can not only handle complex dynamic scenes but also predict future frames with flexible prediction time intervals.First, we introduce a symmetrical cross-modal attention augmentation module to enhance the complementary information between images and events.Second, we propose to jointly achieve optical flow estimation and frame generation by combining the motion information of events and the semantic information of the image, then inpainting the holes produced by forward warping to obtain an ideal prediction frame.Based on these, we propose a lightweight pyramidal coarse-to-fine model that can predict a 720P frame within 25 ms.Extensive experiments show that our proposed model significantly outperforms the state-of-the-art frame-based and event-based VFP methods and has the fastest runtime.Code is available at https://npucvr.github.io/VFPSIE/.

源语言	英语
主期刊名	Technical Tracks 14
编辑	Michael Wooldridge, Jennifer Dy, Sriraam Natarajan
出版商	Association for the Advancement of Artificial Intelligence
页	7748-7756
页数	9
版本	7
ISBN（电子版）	1577358872, 1577358872, 1577358872, 1577358872, 1577358872, 1577358872, 1577358872, 1577358872, 1577358872, 1577358872, 1577358872, 1577358872, 1577358872, 1577358872, 1577358872, 1577358872, 1577358872, 1577358872, 1577358872, 1577358872, 1577358872, 9781577358879, 9781577358879, 9781577358879, 9781577358879, 9781577358879, 9781577358879, 9781577358879, 9781577358879, 9781577358879, 9781577358879, 9781577358879, 9781577358879, 9781577358879, 9781577358879, 9781577358879, 9781577358879, 9781577358879, 9781577358879, 9781577358879, 9781577358879, 9781577358879
DOI	https://doi.org/10.1609/aaai.v38i7.28609
出版状态	已出版 - 25 3月 2024
活动	38th AAAI Conference on Artificial Intelligence, AAAI 2024 - Vancouver, 加拿大期限: 20 2月 2024 → 27 2月 2024

出版系列

姓名	Proceedings of the AAAI Conference on Artificial Intelligence
编号	7
卷	38
ISSN（印刷版）	2159-5399
ISSN（电子版）	2374-3468

会议

会议	38th AAAI Conference on Artificial Intelligence, AAAI 2024
国家/地区	加拿大
市	Vancouver
时期	20/02/24 → 27/02/24

访问文件

10.1609/aaai.v38i7.28609

其它文件与链接

链接到 Scopus 的出版物

引用此

Zhu, J., Wan, Z., & Dai, Y. (2024). Video Frame Prediction from a Single Image and Events. 在 M. Wooldridge, J. Dy, & S. Natarajan (编辑), Technical Tracks 14 (7 编辑, 页码 7748-7756). (Proceedings of the AAAI Conference on Artificial Intelligence; 卷 38, 号码 7). Association for the Advancement of Artificial Intelligence. https://doi.org/10.1609/aaai.v38i7.28609

@inproceedings{27ddb14274dc4edd92f07a8c9f588f55,

title = "Video Frame Prediction from a Single Image and Events",

abstract = "Recently, the task of Video Frame Prediction (VFP), which predicts future video frames from previous ones through extrapolation, has made remarkable progress.However, the performance of existing VFP methods is still far from satisfactory due to the fixed framerate video used: 1) they have difficulties in handling complex dynamic scenes; 2) they cannot predict future frames with flexible prediction time intervals.The event cameras can record the intensity changes asynchronously with a very high temporal resolution, which provides rich dynamic information about the observed scenes.In this paper, we propose to predict video frames from a single image and the following events, which can not only handle complex dynamic scenes but also predict future frames with flexible prediction time intervals.First, we introduce a symmetrical cross-modal attention augmentation module to enhance the complementary information between images and events.Second, we propose to jointly achieve optical flow estimation and frame generation by combining the motion information of events and the semantic information of the image, then inpainting the holes produced by forward warping to obtain an ideal prediction frame.Based on these, we propose a lightweight pyramidal coarse-to-fine model that can predict a 720P frame within 25 ms.Extensive experiments show that our proposed model significantly outperforms the state-of-the-art frame-based and event-based VFP methods and has the fastest runtime.Code is available at https://npucvr.github.io/VFPSIE/.",

author = "Juanjuan Zhu and Zhexiong Wan and Yuchao Dai",

note = "Publisher Copyright: Copyright {\textcopyright} 2024, Association for the Advancement of Artificial Intelligence (www.aaai.org).All rights reserved.; 38th AAAI Conference on Artificial Intelligence, AAAI 2024 ; Conference date: 20-02-2024 Through 27-02-2024",

year = "2024",

month = mar,

day = "25",

doi = "10.1609/aaai.v38i7.28609",

language = "英语",

series = "Proceedings of the AAAI Conference on Artificial Intelligence",

publisher = "Association for the Advancement of Artificial Intelligence",

number = "7",

pages = "7748--7756",

editor = "Michael Wooldridge and Jennifer Dy and Sriraam Natarajan",

booktitle = "Technical Tracks 14",

edition = "7",

}

Zhu, J, Wan, Z & Dai, Y 2024, Video Frame Prediction from a Single Image and Events. 在 M Wooldridge, J Dy & S Natarajan (编辑), Technical Tracks 14. 7 编辑, Proceedings of the AAAI Conference on Artificial Intelligence, 号码 7, 卷 38, Association for the Advancement of Artificial Intelligence, 页码 7748-7756, 38th AAAI Conference on Artificial Intelligence, AAAI 2024, Vancouver, 加拿大, 20/02/24. https://doi.org/10.1609/aaai.v38i7.28609

Video Frame Prediction from a Single Image and Events. / Zhu, Juanjuan; Wan, Zhexiong; Dai, Yuchao.
Technical Tracks 14. 编辑 / Michael Wooldridge; Jennifer Dy; Sriraam Natarajan. 7. 编辑 Association for the Advancement of Artificial Intelligence, 2024. 页码 7748-7756 (Proceedings of the AAAI Conference on Artificial Intelligence; 卷 38, 号码 7).

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

TY - GEN

T1 - Video Frame Prediction from a Single Image and Events

AU - Zhu, Juanjuan

AU - Wan, Zhexiong

AU - Dai, Yuchao

PY - 2024/3/25

Y1 - 2024/3/25

N2 - Recently, the task of Video Frame Prediction (VFP), which predicts future video frames from previous ones through extrapolation, has made remarkable progress.However, the performance of existing VFP methods is still far from satisfactory due to the fixed framerate video used: 1) they have difficulties in handling complex dynamic scenes; 2) they cannot predict future frames with flexible prediction time intervals.The event cameras can record the intensity changes asynchronously with a very high temporal resolution, which provides rich dynamic information about the observed scenes.In this paper, we propose to predict video frames from a single image and the following events, which can not only handle complex dynamic scenes but also predict future frames with flexible prediction time intervals.First, we introduce a symmetrical cross-modal attention augmentation module to enhance the complementary information between images and events.Second, we propose to jointly achieve optical flow estimation and frame generation by combining the motion information of events and the semantic information of the image, then inpainting the holes produced by forward warping to obtain an ideal prediction frame.Based on these, we propose a lightweight pyramidal coarse-to-fine model that can predict a 720P frame within 25 ms.Extensive experiments show that our proposed model significantly outperforms the state-of-the-art frame-based and event-based VFP methods and has the fastest runtime.Code is available at https://npucvr.github.io/VFPSIE/.

AB - Recently, the task of Video Frame Prediction (VFP), which predicts future video frames from previous ones through extrapolation, has made remarkable progress.However, the performance of existing VFP methods is still far from satisfactory due to the fixed framerate video used: 1) they have difficulties in handling complex dynamic scenes; 2) they cannot predict future frames with flexible prediction time intervals.The event cameras can record the intensity changes asynchronously with a very high temporal resolution, which provides rich dynamic information about the observed scenes.In this paper, we propose to predict video frames from a single image and the following events, which can not only handle complex dynamic scenes but also predict future frames with flexible prediction time intervals.First, we introduce a symmetrical cross-modal attention augmentation module to enhance the complementary information between images and events.Second, we propose to jointly achieve optical flow estimation and frame generation by combining the motion information of events and the semantic information of the image, then inpainting the holes produced by forward warping to obtain an ideal prediction frame.Based on these, we propose a lightweight pyramidal coarse-to-fine model that can predict a 720P frame within 25 ms.Extensive experiments show that our proposed model significantly outperforms the state-of-the-art frame-based and event-based VFP methods and has the fastest runtime.Code is available at https://npucvr.github.io/VFPSIE/.

UR - http://www.scopus.com/inward/record.url?scp=85189502955&partnerID=8YFLogxK

U2 - 10.1609/aaai.v38i7.28609

DO - 10.1609/aaai.v38i7.28609

M3 - 会议稿件

AN - SCOPUS:85189502955

T3 - Proceedings of the AAAI Conference on Artificial Intelligence

SP - 7748

EP - 7756

BT - Technical Tracks 14

A2 - Wooldridge, Michael

A2 - Dy, Jennifer

A2 - Natarajan, Sriraam

PB - Association for the Advancement of Artificial Intelligence

T2 - 38th AAAI Conference on Artificial Intelligence, AAAI 2024

Y2 - 20 February 2024 through 27 February 2024

ER -

Video Frame Prediction from a Single Image and Events

摘要

出版系列

会议

访问文件

其它文件与链接

指纹

引用此