Learning 2D Invariant Affordance Knowledge for 3D Affordance Grounding

Xianqiang Gao; Pingrui Zhang; Delin Qu; Dong Wang; Zhigang Wang; Yan Ding; Bin Zhao

doi:10.1609/aaai.v39i3.32318

Learning 2D Invariant Affordance Knowledge for 3D Affordance Grounding

Xianqiang Gao, Pingrui Zhang, Delin Qu, Dong Wang, Zhigang Wang, Yan Ding, Bin Zhao

光电与智能研究院

科研成果: 期刊稿件 › 会议文章 › 同行评审

摘要

3D Object Affordance Grounding aims to predict the functional regions on a 3D object and lays the foundation for a wide range of applications in robotics. Recent advances tackle this problem via learning a mapping between 3D regions and a single human-object interaction image. However, the geometric structure of the 3D object and the object in the human-object interaction image are not always consistent, leading to poor generalization. To address this issue, we propose to learn generalizable invariant affordance knowledge from multiple human-object interaction images within the same affordance category. Specifically, we introduce the Multi-Image Guided Invariant-Feature-Aware 3D Affordance Grounding (MIFAG) framework. It grounds 3D object affordance regions by identifying common interaction patterns across multiple human-object interaction images. First, the Invariant Affordance Knowledge Extraction Module (IAM) utilizes an iterative updating strategy to gradually extract aligned affordance knowledge from multiple images and integrate it into an affordance dictionary. Then, the Affordance Dictionary Adaptive Fusion Module (ADM) learns comprehensive point cloud representations that consider all affordance candidates in multiple images. In addition, the Multi-Image and Point Affordance (MIPA) benchmark is constructed and our method outperforms existing state-of-the-art methods in various experimental comparisons.

源语言	英语
页（从-至）	3095-3103
页数	9
期刊	Proceedings of the AAAI Conference on Artificial Intelligence
卷	39
期	3
DOI	https://doi.org/10.1609/aaai.v39i3.32318
出版状态	已出版 - 11 4月 2025
活动	39th Annual AAAI Conference on Artificial Intelligence, AAAI 2025 - Philadelphia, 美国期限: 25 2月 2025 → 4 3月 2025

访问文件

10.1609/aaai.v39i3.32318

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{e5a286c763944821942335169d5090bb,

title = "Learning 2D Invariant Affordance Knowledge for 3D Affordance Grounding",

abstract = "3D Object Affordance Grounding aims to predict the functional regions on a 3D object and lays the foundation for a wide range of applications in robotics. Recent advances tackle this problem via learning a mapping between 3D regions and a single human-object interaction image. However, the geometric structure of the 3D object and the object in the human-object interaction image are not always consistent, leading to poor generalization. To address this issue, we propose to learn generalizable invariant affordance knowledge from multiple human-object interaction images within the same affordance category. Specifically, we introduce the Multi-Image Guided Invariant-Feature-Aware 3D Affordance Grounding (MIFAG) framework. It grounds 3D object affordance regions by identifying common interaction patterns across multiple human-object interaction images. First, the Invariant Affordance Knowledge Extraction Module (IAM) utilizes an iterative updating strategy to gradually extract aligned affordance knowledge from multiple images and integrate it into an affordance dictionary. Then, the Affordance Dictionary Adaptive Fusion Module (ADM) learns comprehensive point cloud representations that consider all affordance candidates in multiple images. In addition, the Multi-Image and Point Affordance (MIPA) benchmark is constructed and our method outperforms existing state-of-the-art methods in various experimental comparisons.",

author = "Xianqiang Gao and Pingrui Zhang and Delin Qu and Dong Wang and Zhigang Wang and Yan Ding and Bin Zhao",

note = "Publisher Copyright: {\textcopyright} 2025, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.; 39th Annual AAAI Conference on Artificial Intelligence, AAAI 2025 ; Conference date: 25-02-2025 Through 04-03-2025",

year = "2025",

month = apr,

day = "11",

doi = "10.1609/aaai.v39i3.32318",

language = "英语",

volume = "39",

pages = "3095--3103",

journal = "Proceedings of the AAAI Conference on Artificial Intelligence",

issn = "2159-5399",

publisher = "Association for the Advancement of Artificial Intelligence",

number = "3",

}

TY - JOUR

T1 - Learning 2D Invariant Affordance Knowledge for 3D Affordance Grounding

AU - Gao, Xianqiang

AU - Zhang, Pingrui

AU - Qu, Delin

AU - Wang, Dong

AU - Wang, Zhigang

AU - Ding, Yan

AU - Zhao, Bin

PY - 2025/4/11

Y1 - 2025/4/11

N2 - 3D Object Affordance Grounding aims to predict the functional regions on a 3D object and lays the foundation for a wide range of applications in robotics. Recent advances tackle this problem via learning a mapping between 3D regions and a single human-object interaction image. However, the geometric structure of the 3D object and the object in the human-object interaction image are not always consistent, leading to poor generalization. To address this issue, we propose to learn generalizable invariant affordance knowledge from multiple human-object interaction images within the same affordance category. Specifically, we introduce the Multi-Image Guided Invariant-Feature-Aware 3D Affordance Grounding (MIFAG) framework. It grounds 3D object affordance regions by identifying common interaction patterns across multiple human-object interaction images. First, the Invariant Affordance Knowledge Extraction Module (IAM) utilizes an iterative updating strategy to gradually extract aligned affordance knowledge from multiple images and integrate it into an affordance dictionary. Then, the Affordance Dictionary Adaptive Fusion Module (ADM) learns comprehensive point cloud representations that consider all affordance candidates in multiple images. In addition, the Multi-Image and Point Affordance (MIPA) benchmark is constructed and our method outperforms existing state-of-the-art methods in various experimental comparisons.

AB - 3D Object Affordance Grounding aims to predict the functional regions on a 3D object and lays the foundation for a wide range of applications in robotics. Recent advances tackle this problem via learning a mapping between 3D regions and a single human-object interaction image. However, the geometric structure of the 3D object and the object in the human-object interaction image are not always consistent, leading to poor generalization. To address this issue, we propose to learn generalizable invariant affordance knowledge from multiple human-object interaction images within the same affordance category. Specifically, we introduce the Multi-Image Guided Invariant-Feature-Aware 3D Affordance Grounding (MIFAG) framework. It grounds 3D object affordance regions by identifying common interaction patterns across multiple human-object interaction images. First, the Invariant Affordance Knowledge Extraction Module (IAM) utilizes an iterative updating strategy to gradually extract aligned affordance knowledge from multiple images and integrate it into an affordance dictionary. Then, the Affordance Dictionary Adaptive Fusion Module (ADM) learns comprehensive point cloud representations that consider all affordance candidates in multiple images. In addition, the Multi-Image and Point Affordance (MIPA) benchmark is constructed and our method outperforms existing state-of-the-art methods in various experimental comparisons.

UR - http://www.scopus.com/inward/record.url?scp=105003996314&partnerID=8YFLogxK

U2 - 10.1609/aaai.v39i3.32318

DO - 10.1609/aaai.v39i3.32318

M3 - 会议文章

AN - SCOPUS:105003996314

SN - 2159-5399

VL - 39

SP - 3095

EP - 3103

JO - Proceedings of the AAAI Conference on Artificial Intelligence

JF - Proceedings of the AAAI Conference on Artificial Intelligence

IS - 3

T2 - 39th Annual AAAI Conference on Artificial Intelligence, AAAI 2025

Y2 - 25 February 2025 through 4 March 2025

ER -

Learning 2D Invariant Affordance Knowledge for 3D Affordance Grounding

摘要

访问文件

其它文件与链接

指纹

引用此