Mono3DVG: 3D Visual Grounding in Monocular Images

Yang Zhan, Yuan Yuan, Zhitong Xiong

科研成果: 书/报告/会议事项章节会议稿件同行评审

2 引用 (Scopus)

摘要

We introduce a novel task of 3D visual grounding in monocular RGB images using language descriptions with both appearance and geometry information. Specifically, we build a large-scale dataset, Mono3DRefer, which contains 3D object targets with their corresponding geometric text descriptions, generated by ChatGPT and refined manually. To foster this task, we propose Mono3DVG-TR, an end-to-end transformer-based network, which takes advantage of both the appearance and geometry information in text embeddings for multi-modal learning and 3D object localization. Depth predictor is designed to explicitly learn geometry features. The dual text-guided adapter is proposed to refine multiscale visual and geometry features of the referred object. Based on depth-text-visual stacking attention, the decoder fuses object-level geometric cues and visual appearance into a learnable query. Comprehensive benchmarks and some insightful analyses are provided for Mono3DVG. Extensive comparisons and ablation studies show that our method significantly outperforms all baselines. The dataset and code will be released.

源语言英语
主期刊名Technical Tracks 14
编辑Michael Wooldridge, Jennifer Dy, Sriraam Natarajan
出版商Association for the Advancement of Artificial Intelligence
6988-6996
页数9
版本7
ISBN(电子版)1577358872, 1577358872, 1577358872, 1577358872, 1577358872, 1577358872, 1577358872, 1577358872, 1577358872, 1577358872, 1577358872, 1577358872, 1577358872, 1577358872, 1577358872, 1577358872, 1577358872, 1577358872, 1577358872, 1577358872, 1577358872, 9781577358879, 9781577358879, 9781577358879, 9781577358879, 9781577358879, 9781577358879, 9781577358879, 9781577358879, 9781577358879, 9781577358879, 9781577358879, 9781577358879, 9781577358879, 9781577358879, 9781577358879, 9781577358879, 9781577358879, 9781577358879, 9781577358879, 9781577358879, 9781577358879
DOI
出版状态已出版 - 25 3月 2024
活动38th AAAI Conference on Artificial Intelligence, AAAI 2024 - Vancouver, 加拿大
期限: 20 2月 202427 2月 2024

出版系列

姓名Proceedings of the AAAI Conference on Artificial Intelligence
编号7
38
ISSN(印刷版)2159-5399
ISSN(电子版)2374-3468

会议

会议38th AAAI Conference on Artificial Intelligence, AAAI 2024
国家/地区加拿大
Vancouver
时期20/02/2427/02/24

指纹

探究 'Mono3DVG: 3D Visual Grounding in Monocular Images' 的科研主题。它们共同构成独一无二的指纹。

引用此