HandFormer: Hand pose reconstructing from a single RGB image

Zixun Jiao; Xihan Wang; Jingcao Li; Rongxin Gao; Miao He; Jiao Liang; Zhaoqiang Xia; Quanli Gao

doi:10.1016/j.patrec.2024.05.019

HandFormer: Hand pose reconstructing from a single RGB image

Zixun Jiao, Xihan Wang, Jingcao Li, Rongxin Gao, Miao He, Jiao Liang, Zhaoqiang Xia, Quanli Gao

School of Electronics and Information

Research output: Contribution to journal › Article › peer-review

1 Scopus citations

Abstract

We propose a multi-task progressive Transformer framework to reconstruct hand poses from a single RGB image to address challenges such as hand occlusion hand distraction, and hand shape bias. Our proposed framework comprises three key components: the feature extraction branch, palm segmentation branch, and parameter prediction branch. The feature extraction branch initially employs the progressive Transformer to extract multi-scale features from the input image. Subsequently, these multi-scale features are fed into a multi-layer perceptron layer (MLP) for acquiring palm alignment features. We employ an efficient fusion module to enhance the parameter prediction further features to integrate the palm alignment features with the backbone features. A dense hand model is generated using a pre-computed articulated mesh deformed hand model. We evaluate the performance of our proposed method on STEREO, FreiHAND, and HO3D datasets separately. The experimental results demonstrate that our approach achieves 3D mean error metrics of 10.92 mm, 12.33 mm and 9.6 mm for the respective datasets.

Original language	English
Pages (from-to)	155-164
Number of pages	10
Journal	Pattern Recognition Letters
Volume	183
DOIs	https://doi.org/10.1016/j.patrec.2024.05.019
State	Published - Jul 2024

Keywords

Hand attitude estimation
Hand attitude estimation and segmentation
Multi-scale features
Multitask progressive transformer framework
Multitasking learning

Access to Document

10.1016/j.patrec.2024.05.019

Cite this

@article{25ddd603255b4f878b384f75624a5043,

title = "HandFormer: Hand pose reconstructing from a single RGB image",

abstract = "We propose a multi-task progressive Transformer framework to reconstruct hand poses from a single RGB image to address challenges such as hand occlusion hand distraction, and hand shape bias. Our proposed framework comprises three key components: the feature extraction branch, palm segmentation branch, and parameter prediction branch. The feature extraction branch initially employs the progressive Transformer to extract multi-scale features from the input image. Subsequently, these multi-scale features are fed into a multi-layer perceptron layer (MLP) for acquiring palm alignment features. We employ an efficient fusion module to enhance the parameter prediction further features to integrate the palm alignment features with the backbone features. A dense hand model is generated using a pre-computed articulated mesh deformed hand model. We evaluate the performance of our proposed method on STEREO, FreiHAND, and HO3D datasets separately. The experimental results demonstrate that our approach achieves 3D mean error metrics of 10.92 mm, 12.33 mm and 9.6 mm for the respective datasets.",

keywords = "Hand attitude estimation, Hand attitude estimation and segmentation, Multi-scale features, Multitask progressive transformer framework, Multitasking learning",

author = "Zixun Jiao and Xihan Wang and Jingcao Li and Rongxin Gao and Miao He and Jiao Liang and Zhaoqiang Xia and Quanli Gao",

note = "Publisher Copyright: {\textcopyright} 2024",

year = "2024",

month = jul,

doi = "10.1016/j.patrec.2024.05.019",

language = "英语",

volume = "183",

pages = "155--164",

journal = "Pattern Recognition Letters",

issn = "0167-8655",

publisher = "Elsevier B.V.",

}

TY - JOUR

T1 - HandFormer

T2 - Hand pose reconstructing from a single RGB image

AU - Jiao, Zixun

AU - Wang, Xihan

AU - Li, Jingcao

AU - Gao, Rongxin

AU - He, Miao

AU - Liang, Jiao

AU - Xia, Zhaoqiang

AU - Gao, Quanli

PY - 2024/7

Y1 - 2024/7

N2 - We propose a multi-task progressive Transformer framework to reconstruct hand poses from a single RGB image to address challenges such as hand occlusion hand distraction, and hand shape bias. Our proposed framework comprises three key components: the feature extraction branch, palm segmentation branch, and parameter prediction branch. The feature extraction branch initially employs the progressive Transformer to extract multi-scale features from the input image. Subsequently, these multi-scale features are fed into a multi-layer perceptron layer (MLP) for acquiring palm alignment features. We employ an efficient fusion module to enhance the parameter prediction further features to integrate the palm alignment features with the backbone features. A dense hand model is generated using a pre-computed articulated mesh deformed hand model. We evaluate the performance of our proposed method on STEREO, FreiHAND, and HO3D datasets separately. The experimental results demonstrate that our approach achieves 3D mean error metrics of 10.92 mm, 12.33 mm and 9.6 mm for the respective datasets.

AB - We propose a multi-task progressive Transformer framework to reconstruct hand poses from a single RGB image to address challenges such as hand occlusion hand distraction, and hand shape bias. Our proposed framework comprises three key components: the feature extraction branch, palm segmentation branch, and parameter prediction branch. The feature extraction branch initially employs the progressive Transformer to extract multi-scale features from the input image. Subsequently, these multi-scale features are fed into a multi-layer perceptron layer (MLP) for acquiring palm alignment features. We employ an efficient fusion module to enhance the parameter prediction further features to integrate the palm alignment features with the backbone features. A dense hand model is generated using a pre-computed articulated mesh deformed hand model. We evaluate the performance of our proposed method on STEREO, FreiHAND, and HO3D datasets separately. The experimental results demonstrate that our approach achieves 3D mean error metrics of 10.92 mm, 12.33 mm and 9.6 mm for the respective datasets.

KW - Hand attitude estimation

KW - Hand attitude estimation and segmentation

KW - Multi-scale features

KW - Multitask progressive transformer framework

KW - Multitasking learning

UR - http://www.scopus.com/inward/record.url?scp=85194392255&partnerID=8YFLogxK

U2 - 10.1016/j.patrec.2024.05.019

DO - 10.1016/j.patrec.2024.05.019

M3 - 文章

AN - SCOPUS:85194392255

SN - 0167-8655

VL - 183

SP - 155

EP - 164

JO - Pattern Recognition Letters

JF - Pattern Recognition Letters

ER -

HandFormer: Hand pose reconstructing from a single RGB image

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this