TY - JOUR
T1 - Learning 2D Invariant Affordance Knowledge for 3D Affordance Grounding
AU - Gao, Xianqiang
AU - Zhang, Pingrui
AU - Qu, Delin
AU - Wang, Dong
AU - Wang, Zhigang
AU - Ding, Yan
AU - Zhao, Bin
N1 - Publisher Copyright:
© 2025, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.
PY - 2025/4/11
Y1 - 2025/4/11
N2 - 3D Object Affordance Grounding aims to predict the functional regions on a 3D object and lays the foundation for a wide range of applications in robotics. Recent advances tackle this problem via learning a mapping between 3D regions and a single human-object interaction image. However, the geometric structure of the 3D object and the object in the human-object interaction image are not always consistent, leading to poor generalization. To address this issue, we propose to learn generalizable invariant affordance knowledge from multiple human-object interaction images within the same affordance category. Specifically, we introduce the Multi-Image Guided Invariant-Feature-Aware 3D Affordance Grounding (MIFAG) framework. It grounds 3D object affordance regions by identifying common interaction patterns across multiple human-object interaction images. First, the Invariant Affordance Knowledge Extraction Module (IAM) utilizes an iterative updating strategy to gradually extract aligned affordance knowledge from multiple images and integrate it into an affordance dictionary. Then, the Affordance Dictionary Adaptive Fusion Module (ADM) learns comprehensive point cloud representations that consider all affordance candidates in multiple images. In addition, the Multi-Image and Point Affordance (MIPA) benchmark is constructed and our method outperforms existing state-of-the-art methods in various experimental comparisons.
AB - 3D Object Affordance Grounding aims to predict the functional regions on a 3D object and lays the foundation for a wide range of applications in robotics. Recent advances tackle this problem via learning a mapping between 3D regions and a single human-object interaction image. However, the geometric structure of the 3D object and the object in the human-object interaction image are not always consistent, leading to poor generalization. To address this issue, we propose to learn generalizable invariant affordance knowledge from multiple human-object interaction images within the same affordance category. Specifically, we introduce the Multi-Image Guided Invariant-Feature-Aware 3D Affordance Grounding (MIFAG) framework. It grounds 3D object affordance regions by identifying common interaction patterns across multiple human-object interaction images. First, the Invariant Affordance Knowledge Extraction Module (IAM) utilizes an iterative updating strategy to gradually extract aligned affordance knowledge from multiple images and integrate it into an affordance dictionary. Then, the Affordance Dictionary Adaptive Fusion Module (ADM) learns comprehensive point cloud representations that consider all affordance candidates in multiple images. In addition, the Multi-Image and Point Affordance (MIPA) benchmark is constructed and our method outperforms existing state-of-the-art methods in various experimental comparisons.
UR - http://www.scopus.com/inward/record.url?scp=105003996314&partnerID=8YFLogxK
U2 - 10.1609/aaai.v39i3.32318
DO - 10.1609/aaai.v39i3.32318
M3 - 会议文章
AN - SCOPUS:105003996314
SN - 2159-5399
VL - 39
SP - 3095
EP - 3103
JO - Proceedings of the AAAI Conference on Artificial Intelligence
JF - Proceedings of the AAAI Conference on Artificial Intelligence
IS - 3
T2 - 39th Annual AAAI Conference on Artificial Intelligence, AAAI 2025
Y2 - 25 February 2025 through 4 March 2025
ER -