TY - GEN
T1 - Point-DMAE
T2 - 34th ACM International Conference on Information and Knowledge Management, CIKM 2025
AU - Jin, Xianglong
AU - Wang, Zheng
AU - Zheng, Wenjie
AU - Nie, Feiping
N1 - Publisher Copyright:
© 2025 ACM.
PY - 2025/11/10
Y1 - 2025/11/10
N2 - Masked autoencoders have been extensively utilized in 3D point cloud self-supervised learning, where the fundamental approach involves masking a portion of the point cloud and subsequently reconstructing it. This process is hypothesized to enhance model learning by leveraging the inherent structure of the point cloud data. However, the information density within point clouds is inherently uneven, contrasting with the more uniform distributions found in language and 2D image data. This uneven distribution suggests that the application of random masking strategies, commonly adopted from NLP and 2D vision, may not be optimal for point cloud data, potentially leading to suboptimal learning outcomes. Based on this observation, we propose a simple yet effective Density-directed Masked Autoencoders for Point Cloud Self-supervised Learning (Point-DMAE), which learns latent semantic point cloud features using a density-directed masking strategy. Specifically, our method employs a dual-branch Transformer architecture to extract both high-level and fine-grained point features through global and local block density-directed masking, respectively. Point-DMAE demonstrates high pre-training efficiency and significantly outperforms our baseline (Point-MAE) on 3D object classification tasks within the ScanObjectNN dataset by 4.13% on OBJ-BG, 5.17% on OBJ-ONLY, and 4.17% on PB-T50-RS. Codes are available at https://github.com/jinxianglong10/Point-DMAE.
AB - Masked autoencoders have been extensively utilized in 3D point cloud self-supervised learning, where the fundamental approach involves masking a portion of the point cloud and subsequently reconstructing it. This process is hypothesized to enhance model learning by leveraging the inherent structure of the point cloud data. However, the information density within point clouds is inherently uneven, contrasting with the more uniform distributions found in language and 2D image data. This uneven distribution suggests that the application of random masking strategies, commonly adopted from NLP and 2D vision, may not be optimal for point cloud data, potentially leading to suboptimal learning outcomes. Based on this observation, we propose a simple yet effective Density-directed Masked Autoencoders for Point Cloud Self-supervised Learning (Point-DMAE), which learns latent semantic point cloud features using a density-directed masking strategy. Specifically, our method employs a dual-branch Transformer architecture to extract both high-level and fine-grained point features through global and local block density-directed masking, respectively. Point-DMAE demonstrates high pre-training efficiency and significantly outperforms our baseline (Point-MAE) on 3D object classification tasks within the ScanObjectNN dataset by 4.13% on OBJ-BG, 5.17% on OBJ-ONLY, and 4.17% on PB-T50-RS. Codes are available at https://github.com/jinxianglong10/Point-DMAE.
KW - density-directed masking
KW - masked autoencoders
KW - point cloud reconstruction
KW - point cloud self-supervised learning
KW - shape classification
UR - https://www.scopus.com/pages/publications/105023168564
U2 - 10.1145/3746252.3761017
DO - 10.1145/3746252.3761017
M3 - 会议稿件
AN - SCOPUS:105023168564
T3 - CIKM 2025 - Proceedings of the 34th ACM International Conference on Information and Knowledge Management
SP - 1231
EP - 1238
BT - CIKM 2025 - Proceedings of the 34th ACM International Conference on Information and Knowledge Management
PB - Association for Computing Machinery, Inc
Y2 - 10 November 2025 through 14 November 2025
ER -