Exploiting temporal consistency for real-time video depth estimation

Haokui Zhang; Chunhua Shen; Ying Li; Yuanzhouhan Cao; Yu Liu; Youliang Yan

doi:10.1109/ICCV.2019.00181

Exploiting temporal consistency for real-time video depth estimation

Haokui Zhang, Chunhua Shen, Ying Li, Yuanzhouhan Cao, Yu Liu, Youliang Yan

School of Computer Science

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

94 Scopus citations

Abstract

Accuracy of depth estimation from static images has been significantly improved recently, by exploiting hierarchical features from deep convolutional neural networks (CNNs). Compared with static images, vast information exists among video frames and can be exploited to improve the depth estimation performance. In this work, we focus on exploring temporal information from monocular videos for depth estimation. Specifically, we take the advantage of convolutional long short-term memory (CLSTM) and propose a novel spatial-temporal CSLTM (ST-CLSTM) structure. Our ST-CLSTM structure can capture not only the spatial features but also the temporal correlations/consistency among consecutive video frames with negligible increase in computational cost. Additionally, in order to maintain the temporal consistency among the estimated depth frames, we apply the generative adversarial learning scheme and design a temporal consistency loss. The temporal consistency loss is combined with the spatial loss to update the model in an end-to-end fashion. By taking advantage of the temporal information, we build a video depth estimation framework that runs in real-time and generates visually pleasant results. Moreover, our approach is flexible and can be generalized to most existing depth estimation frameworks. Code is available at: Https://tinyurl.com/STCLSTM.

Original language	English
Title of host publication	Proceedings - 2019 International Conference on Computer Vision, ICCV 2019
Publisher	Institute of Electrical and Electronics Engineers Inc.
Pages	1725-1734
Number of pages	10
ISBN (Electronic)	9781728148038
DOIs	https://doi.org/10.1109/ICCV.2019.00181
State	Published - Oct 2019
Event	17th IEEE/CVF International Conference on Computer Vision, ICCV 2019 - Seoul, Korea, Republic of Duration: 27 Oct 2019 → 2 Nov 2019

Publication series

Name	Proceedings of the IEEE International Conference on Computer Vision
ISSN (Print)	1550-5499

Conference

Conference	17th IEEE/CVF International Conference on Computer Vision, ICCV 2019
Country/Territory	Korea, Republic of
City	Seoul
Period	27/10/19 → 2/11/19

Access to Document

10.1109/ICCV.2019.00181

Cite this

Zhang, H., Shen, C., Li, Y., Cao, Y., Liu, Y., & Yan, Y. (2019). Exploiting temporal consistency for real-time video depth estimation. In Proceedings - 2019 International Conference on Computer Vision, ICCV 2019 (pp. 1725-1734). Article 9010400 (Proceedings of the IEEE International Conference on Computer Vision). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICCV.2019.00181

@inproceedings{00c879b14e844fca8caa31bf1b4627b8,

title = "Exploiting temporal consistency for real-time video depth estimation",

abstract = "Accuracy of depth estimation from static images has been significantly improved recently, by exploiting hierarchical features from deep convolutional neural networks (CNNs). Compared with static images, vast information exists among video frames and can be exploited to improve the depth estimation performance. In this work, we focus on exploring temporal information from monocular videos for depth estimation. Specifically, we take the advantage of convolutional long short-term memory (CLSTM) and propose a novel spatial-temporal CSLTM (ST-CLSTM) structure. Our ST-CLSTM structure can capture not only the spatial features but also the temporal correlations/consistency among consecutive video frames with negligible increase in computational cost. Additionally, in order to maintain the temporal consistency among the estimated depth frames, we apply the generative adversarial learning scheme and design a temporal consistency loss. The temporal consistency loss is combined with the spatial loss to update the model in an end-to-end fashion. By taking advantage of the temporal information, we build a video depth estimation framework that runs in real-time and generates visually pleasant results. Moreover, our approach is flexible and can be generalized to most existing depth estimation frameworks. Code is available at: Https://tinyurl.com/STCLSTM.",

author = "Haokui Zhang and Chunhua Shen and Ying Li and Yuanzhouhan Cao and Yu Liu and Youliang Yan",

note = "Publisher Copyright: {\textcopyright} 2019 IEEE.; 17th IEEE/CVF International Conference on Computer Vision, ICCV 2019 ; Conference date: 27-10-2019 Through 02-11-2019",

year = "2019",

month = oct,

doi = "10.1109/ICCV.2019.00181",

language = "英语",

series = "Proceedings of the IEEE International Conference on Computer Vision",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "1725--1734",

booktitle = "Proceedings - 2019 International Conference on Computer Vision, ICCV 2019",

}

Zhang, H, Shen, C, Li, Y, Cao, Y, Liu, Y & Yan, Y 2019, Exploiting temporal consistency for real-time video depth estimation. in Proceedings - 2019 International Conference on Computer Vision, ICCV 2019., 9010400, Proceedings of the IEEE International Conference on Computer Vision, Institute of Electrical and Electronics Engineers Inc., pp. 1725-1734, 17th IEEE/CVF International Conference on Computer Vision, ICCV 2019, Seoul, Korea, Republic of, 27/10/19. https://doi.org/10.1109/ICCV.2019.00181

Exploiting temporal consistency for real-time video depth estimation. / Zhang, Haokui; Shen, Chunhua; Li, Ying et al.
Proceedings - 2019 International Conference on Computer Vision, ICCV 2019. Institute of Electrical and Electronics Engineers Inc., 2019. p. 1725-1734 9010400 (Proceedings of the IEEE International Conference on Computer Vision).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - Exploiting temporal consistency for real-time video depth estimation

AU - Zhang, Haokui

AU - Shen, Chunhua

AU - Li, Ying

AU - Cao, Yuanzhouhan

AU - Liu, Yu

AU - Yan, Youliang

PY - 2019/10

Y1 - 2019/10

N2 - Accuracy of depth estimation from static images has been significantly improved recently, by exploiting hierarchical features from deep convolutional neural networks (CNNs). Compared with static images, vast information exists among video frames and can be exploited to improve the depth estimation performance. In this work, we focus on exploring temporal information from monocular videos for depth estimation. Specifically, we take the advantage of convolutional long short-term memory (CLSTM) and propose a novel spatial-temporal CSLTM (ST-CLSTM) structure. Our ST-CLSTM structure can capture not only the spatial features but also the temporal correlations/consistency among consecutive video frames with negligible increase in computational cost. Additionally, in order to maintain the temporal consistency among the estimated depth frames, we apply the generative adversarial learning scheme and design a temporal consistency loss. The temporal consistency loss is combined with the spatial loss to update the model in an end-to-end fashion. By taking advantage of the temporal information, we build a video depth estimation framework that runs in real-time and generates visually pleasant results. Moreover, our approach is flexible and can be generalized to most existing depth estimation frameworks. Code is available at: Https://tinyurl.com/STCLSTM.

AB - Accuracy of depth estimation from static images has been significantly improved recently, by exploiting hierarchical features from deep convolutional neural networks (CNNs). Compared with static images, vast information exists among video frames and can be exploited to improve the depth estimation performance. In this work, we focus on exploring temporal information from monocular videos for depth estimation. Specifically, we take the advantage of convolutional long short-term memory (CLSTM) and propose a novel spatial-temporal CSLTM (ST-CLSTM) structure. Our ST-CLSTM structure can capture not only the spatial features but also the temporal correlations/consistency among consecutive video frames with negligible increase in computational cost. Additionally, in order to maintain the temporal consistency among the estimated depth frames, we apply the generative adversarial learning scheme and design a temporal consistency loss. The temporal consistency loss is combined with the spatial loss to update the model in an end-to-end fashion. By taking advantage of the temporal information, we build a video depth estimation framework that runs in real-time and generates visually pleasant results. Moreover, our approach is flexible and can be generalized to most existing depth estimation frameworks. Code is available at: Https://tinyurl.com/STCLSTM.

UR - http://www.scopus.com/inward/record.url?scp=85077966011&partnerID=8YFLogxK

U2 - 10.1109/ICCV.2019.00181

DO - 10.1109/ICCV.2019.00181

M3 - 会议稿件

AN - SCOPUS:85077966011

T3 - Proceedings of the IEEE International Conference on Computer Vision

SP - 1725

EP - 1734

BT - Proceedings - 2019 International Conference on Computer Vision, ICCV 2019

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 17th IEEE/CVF International Conference on Computer Vision, ICCV 2019

Y2 - 27 October 2019 through 2 November 2019

ER -

Zhang H, Shen C, Li Y, Cao Y, Liu Y, Yan Y. Exploiting temporal consistency for real-time video depth estimation. In Proceedings - 2019 International Conference on Computer Vision, ICCV 2019. Institute of Electrical and Electronics Engineers Inc. 2019. p. 1725-1734. 9010400. (Proceedings of the IEEE International Conference on Computer Vision). doi: 10.1109/ICCV.2019.00181

Exploiting temporal consistency for real-time video depth estimation

Abstract

Publication series

Conference

Access to Document

Other files and links

Fingerprint

Cite this