TY - JOUR
T1 - HandFormer
T2 - Hand pose reconstructing from a single RGB image
AU - Jiao, Zixun
AU - Wang, Xihan
AU - Li, Jingcao
AU - Gao, Rongxin
AU - He, Miao
AU - Liang, Jiao
AU - Xia, Zhaoqiang
AU - Gao, Quanli
N1 - Publisher Copyright:
© 2024
PY - 2024/7
Y1 - 2024/7
N2 - We propose a multi-task progressive Transformer framework to reconstruct hand poses from a single RGB image to address challenges such as hand occlusion hand distraction, and hand shape bias. Our proposed framework comprises three key components: the feature extraction branch, palm segmentation branch, and parameter prediction branch. The feature extraction branch initially employs the progressive Transformer to extract multi-scale features from the input image. Subsequently, these multi-scale features are fed into a multi-layer perceptron layer (MLP) for acquiring palm alignment features. We employ an efficient fusion module to enhance the parameter prediction further features to integrate the palm alignment features with the backbone features. A dense hand model is generated using a pre-computed articulated mesh deformed hand model. We evaluate the performance of our proposed method on STEREO, FreiHAND, and HO3D datasets separately. The experimental results demonstrate that our approach achieves 3D mean error metrics of 10.92 mm, 12.33 mm and 9.6 mm for the respective datasets.
AB - We propose a multi-task progressive Transformer framework to reconstruct hand poses from a single RGB image to address challenges such as hand occlusion hand distraction, and hand shape bias. Our proposed framework comprises three key components: the feature extraction branch, palm segmentation branch, and parameter prediction branch. The feature extraction branch initially employs the progressive Transformer to extract multi-scale features from the input image. Subsequently, these multi-scale features are fed into a multi-layer perceptron layer (MLP) for acquiring palm alignment features. We employ an efficient fusion module to enhance the parameter prediction further features to integrate the palm alignment features with the backbone features. A dense hand model is generated using a pre-computed articulated mesh deformed hand model. We evaluate the performance of our proposed method on STEREO, FreiHAND, and HO3D datasets separately. The experimental results demonstrate that our approach achieves 3D mean error metrics of 10.92 mm, 12.33 mm and 9.6 mm for the respective datasets.
KW - Hand attitude estimation
KW - Hand attitude estimation and segmentation
KW - Multi-scale features
KW - Multitask progressive transformer framework
KW - Multitasking learning
UR - http://www.scopus.com/inward/record.url?scp=85194392255&partnerID=8YFLogxK
U2 - 10.1016/j.patrec.2024.05.019
DO - 10.1016/j.patrec.2024.05.019
M3 - 文章
AN - SCOPUS:85194392255
SN - 0167-8655
VL - 183
SP - 155
EP - 164
JO - Pattern Recognition Letters
JF - Pattern Recognition Letters
ER -