High-resolution network with an auxiliary channel for 2D hand pose estimation

Tianhong Pan; Zheng Wang

doi:10.1007/s11042-023-16045-x

High-resolution network with an auxiliary channel for 2D hand pose estimation

Tianhong Pan, Zheng Wang

Anhui University

Research output: Contribution to journal › Article › peer-review

1 Scopus citations

Abstract

High-resolution networks have been applied in various fields because of their advanced architecture. However, multiple multi-scale fusions of high and low-dimensional semantic information during hand pose estimation can blur the position information obtained in high resolution, causing overfitting. To address this problem, we added an auxiliary channel parallel to the original network in this study. The auxiliary channel slices images using a slicing operation instead of a convolutional downscaling operation to preserve the full information in the input. The input is then computed by following four convolution layers to obtain the initial position correction information, and the results are combined with the network for prediction. Adding the auxiliary channel increases the number of parameters in the original network by only 0.7%, but obtains a high accuracy gain, which is particularly noticeable on lightweight networks. We performed several experiments to verify the effectiveness of this method using multiple datasets.

Original language	English
Pages (from-to)	36683-36694
Number of pages	12
Journal	Multimedia Tools and Applications
Volume	83
Issue number	12
DOIs	https://doi.org/10.1007/s11042-023-16045-x
State	Published - Apr 2024
Externally published	Yes

Keywords

High-resolution network (HRnet)
auxiliary channel
multi-scale integration
slice operation

Access to Document

10.1007/s11042-023-16045-x

Cite this

@article{0fb712f3f79846bf916f44714c266afc,

title = "High-resolution network with an auxiliary channel for 2D hand pose estimation",

abstract = "High-resolution networks have been applied in various fields because of their advanced architecture. However, multiple multi-scale fusions of high and low-dimensional semantic information during hand pose estimation can blur the position information obtained in high resolution, causing overfitting. To address this problem, we added an auxiliary channel parallel to the original network in this study. The auxiliary channel slices images using a slicing operation instead of a convolutional downscaling operation to preserve the full information in the input. The input is then computed by following four convolution layers to obtain the initial position correction information, and the results are combined with the network for prediction. Adding the auxiliary channel increases the number of parameters in the original network by only 0.7%, but obtains a high accuracy gain, which is particularly noticeable on lightweight networks. We performed several experiments to verify the effectiveness of this method using multiple datasets.",

keywords = "High-resolution network (HRnet), auxiliary channel, multi-scale integration, slice operation",

author = "Tianhong Pan and Zheng Wang",

note = "Publisher Copyright: {\textcopyright} The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2023.",

year = "2024",

month = apr,

doi = "10.1007/s11042-023-16045-x",

language = "英语",

volume = "83",

pages = "36683--36694",

journal = "Multimedia Tools and Applications",

issn = "1380-7501",

publisher = "Springer",

number = "12",

}

TY - JOUR

T1 - High-resolution network with an auxiliary channel for 2D hand pose estimation

AU - Pan, Tianhong

AU - Wang, Zheng

N1 - Publisher Copyright: © The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2023.

PY - 2024/4

Y1 - 2024/4

N2 - High-resolution networks have been applied in various fields because of their advanced architecture. However, multiple multi-scale fusions of high and low-dimensional semantic information during hand pose estimation can blur the position information obtained in high resolution, causing overfitting. To address this problem, we added an auxiliary channel parallel to the original network in this study. The auxiliary channel slices images using a slicing operation instead of a convolutional downscaling operation to preserve the full information in the input. The input is then computed by following four convolution layers to obtain the initial position correction information, and the results are combined with the network for prediction. Adding the auxiliary channel increases the number of parameters in the original network by only 0.7%, but obtains a high accuracy gain, which is particularly noticeable on lightweight networks. We performed several experiments to verify the effectiveness of this method using multiple datasets.

AB - High-resolution networks have been applied in various fields because of their advanced architecture. However, multiple multi-scale fusions of high and low-dimensional semantic information during hand pose estimation can blur the position information obtained in high resolution, causing overfitting. To address this problem, we added an auxiliary channel parallel to the original network in this study. The auxiliary channel slices images using a slicing operation instead of a convolutional downscaling operation to preserve the full information in the input. The input is then computed by following four convolution layers to obtain the initial position correction information, and the results are combined with the network for prediction. Adding the auxiliary channel increases the number of parameters in the original network by only 0.7%, but obtains a high accuracy gain, which is particularly noticeable on lightweight networks. We performed several experiments to verify the effectiveness of this method using multiple datasets.

KW - High-resolution network (HRnet)

KW - auxiliary channel

KW - multi-scale integration

KW - slice operation

UR - http://www.scopus.com/inward/record.url?scp=85162704586&partnerID=8YFLogxK

U2 - 10.1007/s11042-023-16045-x

DO - 10.1007/s11042-023-16045-x

M3 - 文章

AN - SCOPUS:85162704586

SN - 1380-7501

VL - 83

SP - 36683

EP - 36694

JO - Multimedia Tools and Applications

JF - Multimedia Tools and Applications

IS - 12

ER -

High-resolution network with an auxiliary channel for 2D hand pose estimation

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this