Robotic Grasp Detection Using Structure Prior Attention and Multiscale Features

Lu Chen; Mingdi Niu; Jing Yang; Yuhua Qian; Zhuomao Li; Keqi Wang; Tao Yan; Panfeng Huang

doi:10.1109/TSMC.2024.3446841

Robotic Grasp Detection Using Structure Prior Attention and Multiscale Features

Lu Chen, Mingdi Niu, Jing Yang, Yuhua Qian, Zhuomao Li, Keqi Wang, Tao Yan, Panfeng Huang

航天学院

科研成果: 期刊稿件 › 文章 › 同行评审

6 引用（Scopus）

摘要

Most available grasp detection methods tend to directly predict grasp configurations with deep neural networks, where all features are equally extracted and utilized, leading to the relative restriction of truly useful grasping features. Inspired by the observed three-section structure pattern revealed by human-labeled graspable rectangles, we first design a structure prior attention (SPA) module which uses two-dimensional encoding to enhance the local patterns and utilizes self-attention mechanism to reallocate distribution of grasping-specific features. Then, the proposed SPA module is integrated with fundamental feature extraction modules and residual connection to achieve the implicit and explicit feature fusion, which further serves as the building block of our proposed Unet-like grasp detection network. It takes RGBD images as input and outputs image-size feature maps, from which the grasp configurations can be determined. Extensive comparative experiments on the five public datasets prove our method's superiority to other approaches in detection accuracy, achieving 99.2%, 96.1%, 98.0%, 86.7%, and 92.6% on the Cornell, Jacquard, Clutter, VMRD, and GraspNet datasets. With visual evaluation metrics and user study, the quality maps generated by our method possess more concentrative distribution of high-confidence grasps and clearer discrimination with backgrounds. In addition, its effectiveness is also verified by robotic grasping under real-world scenario, leading to higher success rate.

源语言	英语
页（从-至）	7039-7053
页数	15
期刊	IEEE Transactions on Systems, Man, and Cybernetics: Systems
卷	54
期	11
DOI	https://doi.org/10.1109/TSMC.2024.3446841
出版状态	已出版 - 2024

访问文件

10.1109/TSMC.2024.3446841

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{d588346e061849d8a996764dbeab7689,

title = "Robotic Grasp Detection Using Structure Prior Attention and Multiscale Features",

abstract = "Most available grasp detection methods tend to directly predict grasp configurations with deep neural networks, where all features are equally extracted and utilized, leading to the relative restriction of truly useful grasping features. Inspired by the observed three-section structure pattern revealed by human-labeled graspable rectangles, we first design a structure prior attention (SPA) module which uses two-dimensional encoding to enhance the local patterns and utilizes self-attention mechanism to reallocate distribution of grasping-specific features. Then, the proposed SPA module is integrated with fundamental feature extraction modules and residual connection to achieve the implicit and explicit feature fusion, which further serves as the building block of our proposed Unet-like grasp detection network. It takes RGBD images as input and outputs image-size feature maps, from which the grasp configurations can be determined. Extensive comparative experiments on the five public datasets prove our method's superiority to other approaches in detection accuracy, achieving 99.2%, 96.1%, 98.0%, 86.7%, and 92.6% on the Cornell, Jacquard, Clutter, VMRD, and GraspNet datasets. With visual evaluation metrics and user study, the quality maps generated by our method possess more concentrative distribution of high-confidence grasps and clearer discrimination with backgrounds. In addition, its effectiveness is also verified by robotic grasping under real-world scenario, leading to higher success rate.",

keywords = "Attention mechanism, deep neural network, grasp detection, robot, robotic grasping",

author = "Lu Chen and Mingdi Niu and Jing Yang and Yuhua Qian and Zhuomao Li and Keqi Wang and Tao Yan and Panfeng Huang",

note = "Publisher Copyright: {\textcopyright} 2024 IEEE.",

year = "2024",

doi = "10.1109/TSMC.2024.3446841",

language = "英语",

volume = "54",

pages = "7039--7053",

journal = "IEEE Transactions on Systems, Man, and Cybernetics: Systems",

issn = "2168-2216",

publisher = "IEEE Advancing Technology for Humanity",

number = "11",

}

TY - JOUR

T1 - Robotic Grasp Detection Using Structure Prior Attention and Multiscale Features

AU - Chen, Lu

AU - Niu, Mingdi

AU - Yang, Jing

AU - Qian, Yuhua

AU - Li, Zhuomao

AU - Wang, Keqi

AU - Yan, Tao

AU - Huang, Panfeng

PY - 2024

Y1 - 2024

N2 - Most available grasp detection methods tend to directly predict grasp configurations with deep neural networks, where all features are equally extracted and utilized, leading to the relative restriction of truly useful grasping features. Inspired by the observed three-section structure pattern revealed by human-labeled graspable rectangles, we first design a structure prior attention (SPA) module which uses two-dimensional encoding to enhance the local patterns and utilizes self-attention mechanism to reallocate distribution of grasping-specific features. Then, the proposed SPA module is integrated with fundamental feature extraction modules and residual connection to achieve the implicit and explicit feature fusion, which further serves as the building block of our proposed Unet-like grasp detection network. It takes RGBD images as input and outputs image-size feature maps, from which the grasp configurations can be determined. Extensive comparative experiments on the five public datasets prove our method's superiority to other approaches in detection accuracy, achieving 99.2%, 96.1%, 98.0%, 86.7%, and 92.6% on the Cornell, Jacquard, Clutter, VMRD, and GraspNet datasets. With visual evaluation metrics and user study, the quality maps generated by our method possess more concentrative distribution of high-confidence grasps and clearer discrimination with backgrounds. In addition, its effectiveness is also verified by robotic grasping under real-world scenario, leading to higher success rate.

AB - Most available grasp detection methods tend to directly predict grasp configurations with deep neural networks, where all features are equally extracted and utilized, leading to the relative restriction of truly useful grasping features. Inspired by the observed three-section structure pattern revealed by human-labeled graspable rectangles, we first design a structure prior attention (SPA) module which uses two-dimensional encoding to enhance the local patterns and utilizes self-attention mechanism to reallocate distribution of grasping-specific features. Then, the proposed SPA module is integrated with fundamental feature extraction modules and residual connection to achieve the implicit and explicit feature fusion, which further serves as the building block of our proposed Unet-like grasp detection network. It takes RGBD images as input and outputs image-size feature maps, from which the grasp configurations can be determined. Extensive comparative experiments on the five public datasets prove our method's superiority to other approaches in detection accuracy, achieving 99.2%, 96.1%, 98.0%, 86.7%, and 92.6% on the Cornell, Jacquard, Clutter, VMRD, and GraspNet datasets. With visual evaluation metrics and user study, the quality maps generated by our method possess more concentrative distribution of high-confidence grasps and clearer discrimination with backgrounds. In addition, its effectiveness is also verified by robotic grasping under real-world scenario, leading to higher success rate.

KW - Attention mechanism

KW - deep neural network

KW - grasp detection

KW - robot

KW - robotic grasping

UR - http://www.scopus.com/inward/record.url?scp=85207040604&partnerID=8YFLogxK

U2 - 10.1109/TSMC.2024.3446841

DO - 10.1109/TSMC.2024.3446841

M3 - 文章

AN - SCOPUS:85207040604

SN - 2168-2216

VL - 54

SP - 7039

EP - 7053

JO - IEEE Transactions on Systems, Man, and Cybernetics: Systems

JF - IEEE Transactions on Systems, Man, and Cybernetics: Systems

IS - 11

ER -

Robotic Grasp Detection Using Structure Prior Attention and Multiscale Features

摘要

访问文件

其它文件与链接

指纹

引用此