摘要
Self-supervised 3D point cloud understanding is crucial for scene understanding, where Masked Autoencoders (MAE) have achieved excellent performance in point cloud representation learning. However, existing MAE-style methods fail to consider spatial-semantic variations in masking strategies, and joint learning with multi-view images often overlooks view redundancy. To address these challenges, we propose an MAE framework enhanced with reliable multi-view 2D-3D Key-part alignment and Reinforced masking, named as KR-MAE. Our approach comprises three key innovations: Reinforced Masking (RM) strategically samples visible tokens based on semantic saliency to enhance reconstruction fidelity; Reliable Multi-View Selector (RVS) dynamically refines the most informative image subset by filtering occluded or low-texture views, mitigating detrimental redundancy; Reliable-view 2D-3D Key-part Aligned Transformer (KAT) establishes semantic-aligned correspondence between salient 3D point cloud parts and reliable multi-view 2D image patches, leveraging rich texture cues from 2D images to compensate for sparse geometry in point cloud. Extensive experiments on 3D classification and segmentation benchmarks demonstrate that KR-MAE achieves state-of-the-art performance, surpassing prior multi-modal methods.
| 源语言 | 英语 |
|---|---|
| 页(从-至) | 5530-5538 |
| 页数 | 9 |
| 期刊 | Proceedings of the AAAI Conference on Artificial Intelligence |
| 卷 | 40 |
| 期 | 7 |
| DOI | |
| 出版状态 | 已出版 - 2026 |
| 活动 | 40th AAAI Conference on Artificial Intelligence, AAAI 2026 - Singapore, 新加坡 期限: 20 1月 2026 → 27 1月 2026 |
指纹
探究 'Reliable-View 2D-3D Key-Part Aligned Transformer with Reinforced Masking for 3D Point Cloud Understanding' 的科研主题。它们共同构成独一无二的指纹。引用此
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver