HandFormer: Hand pose reconstructing from a single RGB image

Zixun Jiao; Xihan Wang; Jingcao Li; Rongxin Gao; Miao He; Jiao Liang; Zhaoqiang Xia; Quanli Gao

doi:10.1016/j.patrec.2024.05.019

HandFormer: Hand pose reconstructing from a single RGB image

Zixun Jiao, Xihan Wang, Jingcao Li, Rongxin Gao, Miao He, Jiao Liang, Zhaoqiang Xia, Quanli Gao

电子信息学院

科研成果: 期刊稿件 › 文章 › 同行评审

1 引用（Scopus）

摘要

We propose a multi-task progressive Transformer framework to reconstruct hand poses from a single RGB image to address challenges such as hand occlusion hand distraction, and hand shape bias. Our proposed framework comprises three key components: the feature extraction branch, palm segmentation branch, and parameter prediction branch. The feature extraction branch initially employs the progressive Transformer to extract multi-scale features from the input image. Subsequently, these multi-scale features are fed into a multi-layer perceptron layer (MLP) for acquiring palm alignment features. We employ an efficient fusion module to enhance the parameter prediction further features to integrate the palm alignment features with the backbone features. A dense hand model is generated using a pre-computed articulated mesh deformed hand model. We evaluate the performance of our proposed method on STEREO, FreiHAND, and HO3D datasets separately. The experimental results demonstrate that our approach achieves 3D mean error metrics of 10.92 mm, 12.33 mm and 9.6 mm for the respective datasets.

源语言	英语
页（从-至）	155-164
页数	10
期刊	Pattern Recognition Letters
卷	183
DOI	https://doi.org/10.1016/j.patrec.2024.05.019
出版状态	已出版 - 7月 2024

访问文件

10.1016/j.patrec.2024.05.019

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{25ddd603255b4f878b384f75624a5043,

title = "HandFormer: Hand pose reconstructing from a single RGB image",

abstract = "We propose a multi-task progressive Transformer framework to reconstruct hand poses from a single RGB image to address challenges such as hand occlusion hand distraction, and hand shape bias. Our proposed framework comprises three key components: the feature extraction branch, palm segmentation branch, and parameter prediction branch. The feature extraction branch initially employs the progressive Transformer to extract multi-scale features from the input image. Subsequently, these multi-scale features are fed into a multi-layer perceptron layer (MLP) for acquiring palm alignment features. We employ an efficient fusion module to enhance the parameter prediction further features to integrate the palm alignment features with the backbone features. A dense hand model is generated using a pre-computed articulated mesh deformed hand model. We evaluate the performance of our proposed method on STEREO, FreiHAND, and HO3D datasets separately. The experimental results demonstrate that our approach achieves 3D mean error metrics of 10.92 mm, 12.33 mm and 9.6 mm for the respective datasets.",

keywords = "Hand attitude estimation, Hand attitude estimation and segmentation, Multi-scale features, Multitask progressive transformer framework, Multitasking learning",

author = "Zixun Jiao and Xihan Wang and Jingcao Li and Rongxin Gao and Miao He and Jiao Liang and Zhaoqiang Xia and Quanli Gao",

note = "Publisher Copyright: {\textcopyright} 2024",

year = "2024",

month = jul,

doi = "10.1016/j.patrec.2024.05.019",

language = "英语",

volume = "183",

pages = "155--164",

journal = "Pattern Recognition Letters",

issn = "0167-8655",

publisher = "Elsevier B.V.",

}

TY - JOUR

T1 - HandFormer

T2 - Hand pose reconstructing from a single RGB image

AU - Jiao, Zixun

AU - Wang, Xihan

AU - Li, Jingcao

AU - Gao, Rongxin

AU - He, Miao

AU - Liang, Jiao

AU - Xia, Zhaoqiang

AU - Gao, Quanli

PY - 2024/7

Y1 - 2024/7

N2 - We propose a multi-task progressive Transformer framework to reconstruct hand poses from a single RGB image to address challenges such as hand occlusion hand distraction, and hand shape bias. Our proposed framework comprises three key components: the feature extraction branch, palm segmentation branch, and parameter prediction branch. The feature extraction branch initially employs the progressive Transformer to extract multi-scale features from the input image. Subsequently, these multi-scale features are fed into a multi-layer perceptron layer (MLP) for acquiring palm alignment features. We employ an efficient fusion module to enhance the parameter prediction further features to integrate the palm alignment features with the backbone features. A dense hand model is generated using a pre-computed articulated mesh deformed hand model. We evaluate the performance of our proposed method on STEREO, FreiHAND, and HO3D datasets separately. The experimental results demonstrate that our approach achieves 3D mean error metrics of 10.92 mm, 12.33 mm and 9.6 mm for the respective datasets.

AB - We propose a multi-task progressive Transformer framework to reconstruct hand poses from a single RGB image to address challenges such as hand occlusion hand distraction, and hand shape bias. Our proposed framework comprises three key components: the feature extraction branch, palm segmentation branch, and parameter prediction branch. The feature extraction branch initially employs the progressive Transformer to extract multi-scale features from the input image. Subsequently, these multi-scale features are fed into a multi-layer perceptron layer (MLP) for acquiring palm alignment features. We employ an efficient fusion module to enhance the parameter prediction further features to integrate the palm alignment features with the backbone features. A dense hand model is generated using a pre-computed articulated mesh deformed hand model. We evaluate the performance of our proposed method on STEREO, FreiHAND, and HO3D datasets separately. The experimental results demonstrate that our approach achieves 3D mean error metrics of 10.92 mm, 12.33 mm and 9.6 mm for the respective datasets.

KW - Hand attitude estimation

KW - Hand attitude estimation and segmentation

KW - Multi-scale features

KW - Multitask progressive transformer framework

KW - Multitasking learning

UR - http://www.scopus.com/inward/record.url?scp=85194392255&partnerID=8YFLogxK

U2 - 10.1016/j.patrec.2024.05.019

DO - 10.1016/j.patrec.2024.05.019

M3 - 文章

AN - SCOPUS:85194392255

SN - 0167-8655

VL - 183

SP - 155

EP - 164

JO - Pattern Recognition Letters

JF - Pattern Recognition Letters

ER -

HandFormer: Hand pose reconstructing from a single RGB image

摘要

访问文件

其它文件与链接

指纹

引用此