Disentangling Deep Network for Reconstructing 3D Object Shapes from Single 2D Images

Yang Yang; Junwei Han; Dingwen Zhang; De Cheng

doi:10.1007/978-3-030-88007-1_13

Disentangling Deep Network for Reconstructing 3D Object Shapes from Single 2D Images

Yang Yang, Junwei Han, Dingwen Zhang, De Cheng

School of Automation

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

1 Scopus citations

Abstract

Recovering 3D shapes of deformable objects from single 2D images is an extremely challenging and ill-posed problem. Most existing approaches are based on structure-from-motion or graph inference, where a 3D shape is solved by fitting 2D keypoints/mask instead of directly using the vital cue in the original 2D image. These methods usually require multiple views of an object instance and rely on accurate labeling, detection, and matching of 2D keypoints/mask across multiple images. To overcome these limitations, we make effort to reconstruct 3D deformable object shapes directly from the given unconstrained 2D images. In training, instead of using multiple images per object instance, our approach relaxes the constraint to use images from the same object category (with one 2D image per object instance). The key is to disentangle the category-specific representation of the 3D shape identity and the instance-specific representation of the 3D shape displacement from the 2D training images. In testing, the 3D shape of an object can be reconstructed from the given image by deforming the 3D shape identity according to the 3D shape displacement. To achieve this goal, we propose a novel convolutional encoder-decoder network—the Disentangling Deep Network (DisDN). To demonstrate the effectiveness of the proposed approach, we implement comprehensive experiments on the challenging PASCAL VOC benchmark and use different 3D shape ground-truth in training and testing to avoiding overfitting. The obtained experimental results show that DisDN outperforms other state-of-the-art and baseline methods.

Original language	English
Title of host publication	Pattern Recognition and Computer Vision - 4th Chinese Conference, PRCV 2021, Proceedings
Editors	Huimin Ma, Liang Wang, Changshui Zhang, Fei Wu, Tieniu Tan, Yaonan Wang, Jianhuang Lai, Yao Zhao
Publisher	Springer Science and Business Media Deutschland GmbH
Pages	153-166
Number of pages	14
ISBN (Print)	9783030880064
DOIs	https://doi.org/10.1007/978-3-030-88007-1_13
State	Published - 2021
Event	4th Chinese Conference on Pattern Recognition and Computer Vision, PRCV 2021 - Beijing, China Duration: 29 Oct 2021 → 1 Nov 2021

Publication series

Name	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume	13020 LNCS
ISSN (Print)	0302-9743
ISSN (Electronic)	1611-3349

Conference

Conference	4th Chinese Conference on Pattern Recognition and Computer Vision, PRCV 2021
Country/Territory	China
City	Beijing
Period	29/10/21 → 1/11/21

Keywords

3D shape reconstruction
Disentangling deep network
Point cloud

Access to Document

10.1007/978-3-030-88007-1_13

Cite this

Yang, Y., Han, J., Zhang, D., & Cheng, D. (2021). Disentangling Deep Network for Reconstructing 3D Object Shapes from Single 2D Images. In H. Ma, L. Wang, C. Zhang, F. Wu, T. Tan, Y. Wang, J. Lai, & Y. Zhao (Eds.), Pattern Recognition and Computer Vision - 4th Chinese Conference, PRCV 2021, Proceedings (pp. 153-166). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 13020 LNCS). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-030-88007-1_13

Yang, Yang ; Han, Junwei ; Zhang, Dingwen et al. / Disentangling Deep Network for Reconstructing 3D Object Shapes from Single 2D Images. Pattern Recognition and Computer Vision - 4th Chinese Conference, PRCV 2021, Proceedings. editor / Huimin Ma ; Liang Wang ; Changshui Zhang ; Fei Wu ; Tieniu Tan ; Yaonan Wang ; Jianhuang Lai ; Yao Zhao. Springer Science and Business Media Deutschland GmbH, 2021. pp. 153-166 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).

@inproceedings{9cba25be9eb240779d95fe41a655bec4,

title = "Disentangling Deep Network for Reconstructing 3D Object Shapes from Single 2D Images",

abstract = "Recovering 3D shapes of deformable objects from single 2D images is an extremely challenging and ill-posed problem. Most existing approaches are based on structure-from-motion or graph inference, where a 3D shape is solved by fitting 2D keypoints/mask instead of directly using the vital cue in the original 2D image. These methods usually require multiple views of an object instance and rely on accurate labeling, detection, and matching of 2D keypoints/mask across multiple images. To overcome these limitations, we make effort to reconstruct 3D deformable object shapes directly from the given unconstrained 2D images. In training, instead of using multiple images per object instance, our approach relaxes the constraint to use images from the same object category (with one 2D image per object instance). The key is to disentangle the category-specific representation of the 3D shape identity and the instance-specific representation of the 3D shape displacement from the 2D training images. In testing, the 3D shape of an object can be reconstructed from the given image by deforming the 3D shape identity according to the 3D shape displacement. To achieve this goal, we propose a novel convolutional encoder-decoder network—the Disentangling Deep Network (DisDN). To demonstrate the effectiveness of the proposed approach, we implement comprehensive experiments on the challenging PASCAL VOC benchmark and use different 3D shape ground-truth in training and testing to avoiding overfitting. The obtained experimental results show that DisDN outperforms other state-of-the-art and baseline methods.",

keywords = "3D shape reconstruction, Disentangling deep network, Point cloud",

author = "Yang Yang and Junwei Han and Dingwen Zhang and De Cheng",

note = "Publisher Copyright: {\textcopyright} 2021, Springer Nature Switzerland AG.; 4th Chinese Conference on Pattern Recognition and Computer Vision, PRCV 2021 ; Conference date: 29-10-2021 Through 01-11-2021",

year = "2021",

doi = "10.1007/978-3-030-88007-1_13",

language = "英语",

isbn = "9783030880064",

series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

publisher = "Springer Science and Business Media Deutschland GmbH",

pages = "153--166",

editor = "Huimin Ma and Liang Wang and Changshui Zhang and Fei Wu and Tieniu Tan and Yaonan Wang and Jianhuang Lai and Yao Zhao",

booktitle = "Pattern Recognition and Computer Vision - 4th Chinese Conference, PRCV 2021, Proceedings",

}

Yang, Y, Han, J , Zhang, D & Cheng, D 2021, Disentangling Deep Network for Reconstructing 3D Object Shapes from Single 2D Images. in H Ma, L Wang, C Zhang, F Wu, T Tan, Y Wang, J Lai & Y Zhao (eds), Pattern Recognition and Computer Vision - 4th Chinese Conference, PRCV 2021, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 13020 LNCS, Springer Science and Business Media Deutschland GmbH, pp. 153-166, 4th Chinese Conference on Pattern Recognition and Computer Vision, PRCV 2021, Beijing, China, 29/10/21. https://doi.org/10.1007/978-3-030-88007-1_13

Disentangling Deep Network for Reconstructing 3D Object Shapes from Single 2D Images. / Yang, Yang; Han, Junwei ; Zhang, Dingwen et al.
Pattern Recognition and Computer Vision - 4th Chinese Conference, PRCV 2021, Proceedings. ed. / Huimin Ma; Liang Wang; Changshui Zhang; Fei Wu; Tieniu Tan; Yaonan Wang; Jianhuang Lai; Yao Zhao. Springer Science and Business Media Deutschland GmbH, 2021. p. 153-166 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 13020 LNCS).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - Disentangling Deep Network for Reconstructing 3D Object Shapes from Single 2D Images

AU - Yang, Yang

AU - Han, Junwei

AU - Zhang, Dingwen

AU - Cheng, De

PY - 2021

Y1 - 2021

N2 - Recovering 3D shapes of deformable objects from single 2D images is an extremely challenging and ill-posed problem. Most existing approaches are based on structure-from-motion or graph inference, where a 3D shape is solved by fitting 2D keypoints/mask instead of directly using the vital cue in the original 2D image. These methods usually require multiple views of an object instance and rely on accurate labeling, detection, and matching of 2D keypoints/mask across multiple images. To overcome these limitations, we make effort to reconstruct 3D deformable object shapes directly from the given unconstrained 2D images. In training, instead of using multiple images per object instance, our approach relaxes the constraint to use images from the same object category (with one 2D image per object instance). The key is to disentangle the category-specific representation of the 3D shape identity and the instance-specific representation of the 3D shape displacement from the 2D training images. In testing, the 3D shape of an object can be reconstructed from the given image by deforming the 3D shape identity according to the 3D shape displacement. To achieve this goal, we propose a novel convolutional encoder-decoder network—the Disentangling Deep Network (DisDN). To demonstrate the effectiveness of the proposed approach, we implement comprehensive experiments on the challenging PASCAL VOC benchmark and use different 3D shape ground-truth in training and testing to avoiding overfitting. The obtained experimental results show that DisDN outperforms other state-of-the-art and baseline methods.

AB - Recovering 3D shapes of deformable objects from single 2D images is an extremely challenging and ill-posed problem. Most existing approaches are based on structure-from-motion or graph inference, where a 3D shape is solved by fitting 2D keypoints/mask instead of directly using the vital cue in the original 2D image. These methods usually require multiple views of an object instance and rely on accurate labeling, detection, and matching of 2D keypoints/mask across multiple images. To overcome these limitations, we make effort to reconstruct 3D deformable object shapes directly from the given unconstrained 2D images. In training, instead of using multiple images per object instance, our approach relaxes the constraint to use images from the same object category (with one 2D image per object instance). The key is to disentangle the category-specific representation of the 3D shape identity and the instance-specific representation of the 3D shape displacement from the 2D training images. In testing, the 3D shape of an object can be reconstructed from the given image by deforming the 3D shape identity according to the 3D shape displacement. To achieve this goal, we propose a novel convolutional encoder-decoder network—the Disentangling Deep Network (DisDN). To demonstrate the effectiveness of the proposed approach, we implement comprehensive experiments on the challenging PASCAL VOC benchmark and use different 3D shape ground-truth in training and testing to avoiding overfitting. The obtained experimental results show that DisDN outperforms other state-of-the-art and baseline methods.

KW - 3D shape reconstruction

KW - Disentangling deep network

KW - Point cloud

UR - http://www.scopus.com/inward/record.url?scp=85118222936&partnerID=8YFLogxK

U2 - 10.1007/978-3-030-88007-1_13

DO - 10.1007/978-3-030-88007-1_13

M3 - 会议稿件

AN - SCOPUS:85118222936

SN - 9783030880064

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 153

EP - 166

BT - Pattern Recognition and Computer Vision - 4th Chinese Conference, PRCV 2021, Proceedings

A2 - Ma, Huimin

A2 - Wang, Liang

A2 - Zhang, Changshui

A2 - Wu, Fei

A2 - Tan, Tieniu

A2 - Wang, Yaonan

A2 - Lai, Jianhuang

A2 - Zhao, Yao

PB - Springer Science and Business Media Deutschland GmbH

T2 - 4th Chinese Conference on Pattern Recognition and Computer Vision, PRCV 2021

Y2 - 29 October 2021 through 1 November 2021

ER -

Yang Y, Han J , Zhang D, Cheng D. Disentangling Deep Network for Reconstructing 3D Object Shapes from Single 2D Images. In Ma H, Wang L, Zhang C, Wu F, Tan T, Wang Y, Lai J, Zhao Y, editors, Pattern Recognition and Computer Vision - 4th Chinese Conference, PRCV 2021, Proceedings. Springer Science and Business Media Deutschland GmbH. 2021. p. 153-166. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). doi: 10.1007/978-3-030-88007-1_13

Disentangling Deep Network for Reconstructing 3D Object Shapes from Single 2D Images

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Fingerprint

Cite this