Feature Combination Based on Receptive Fields and Cross-Fusion Feature Pyramid for Object Detection

Yongqiang Zhao; Yuan Rao; Shipeng Dong; Jiangnan Qi

doi:10.1007/978-3-030-36711-4_4

Feature Combination Based on Receptive Fields and Cross-Fusion Feature Pyramid for Object Detection

Yongqiang Zhao, Yuan Rao, Shipeng Dong, Jiangnan Qi

Xi'an Jiaotong University

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

摘要

Currently, the state-of-the-art method about object detector in image mainly depends on deep backbones, such as ResNet-50, DarkNet-53, ResNet-101 and DenseNet-169, which benefits for their powerful capability of feature representations but suffers from high computational cost. On the basis of fast lightweight backbone network (i.e., VGG-16), this paper improves the capability of feature representations by combining features of different receptive fields and cross-fusing feature pyramids, and finally establishes a fast and accurate detector. The architecture of our model is designed to integrate FC-CF Net with two sub-modules: FC module and CF module. Inspired by the structure of receptive fields in visual systems of human, we propose a novel method about Feature Combination Based on Receptive Fields module (FC module), which takes the relationship between the size and eccentricity of receptive fields into account, and then combine them with original features for increasing the receptive field and information of the feature map. Furthermore, based on the structure of FPN (Feature Pyramid Network), we design a novel Cross-Fusion Feature Pyramid module (CF module), which combines top-down and bottom-up connections to fuse features across scales, and achieves high-level semantic feature map at all scales. Extensive experiments on PASCAL VOC 2007 and 2012 demonstrate that FC-CF Net achieves state-of-the-art detection accuracy (i.e. 82.4% mAP, 80.5% mAP) with high efficiency (i.e. 69 FPS, 35 FPS).

源语言	英语
主期刊名	Neural Information Processing - 26th International Conference, ICONIP 2019, Proceedings
编辑	Tom Gedeon, Kok Wai Wong, Minho Lee
出版商	Springer
页	37-49
页数	13
ISBN（印刷版）	9783030367107
DOI	https://doi.org/10.1007/978-3-030-36711-4_4
出版状态	已出版 - 2019
已对外发布	是
活动	26th International Conference on Neural Information Processing, ICONIP 2019 - Sydney, 澳大利亚期限: 12 12月 2019 → 15 12月 2019

出版系列

姓名	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
卷	11954 LNCS
ISSN（印刷版）	0302-9743
ISSN（电子版）	1611-3349

会议

会议	26th International Conference on Neural Information Processing, ICONIP 2019
国家/地区	澳大利亚
市	Sydney
时期	12/12/19 → 15/12/19

访问文件

10.1007/978-3-030-36711-4_4

其它文件与链接

链接到 Scopus 的出版物

引用此

Zhao, Y., Rao, Y., Dong, S., & Qi, J. (2019). Feature Combination Based on Receptive Fields and Cross-Fusion Feature Pyramid for Object Detection. 在 T. Gedeon, K. W. Wong, & M. Lee (编辑), Neural Information Processing - 26th International Conference, ICONIP 2019, Proceedings (页码 37-49). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); 卷 11954 LNCS). Springer. https://doi.org/10.1007/978-3-030-36711-4_4

Zhao, Yongqiang ; Rao, Yuan ; Dong, Shipeng 等. / Feature Combination Based on Receptive Fields and Cross-Fusion Feature Pyramid for Object Detection. Neural Information Processing - 26th International Conference, ICONIP 2019, Proceedings. 编辑 / Tom Gedeon ; Kok Wai Wong ; Minho Lee. Springer, 2019. 页码 37-49 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).

@inproceedings{af55f15907e64e38b0b139fa607517e6,

title = "Feature Combination Based on Receptive Fields and Cross-Fusion Feature Pyramid for Object Detection",

abstract = "Currently, the state-of-the-art method about object detector in image mainly depends on deep backbones, such as ResNet-50, DarkNet-53, ResNet-101 and DenseNet-169, which benefits for their powerful capability of feature representations but suffers from high computational cost. On the basis of fast lightweight backbone network (i.e., VGG-16), this paper improves the capability of feature representations by combining features of different receptive fields and cross-fusing feature pyramids, and finally establishes a fast and accurate detector. The architecture of our model is designed to integrate FC-CF Net with two sub-modules: FC module and CF module. Inspired by the structure of receptive fields in visual systems of human, we propose a novel method about Feature Combination Based on Receptive Fields module (FC module), which takes the relationship between the size and eccentricity of receptive fields into account, and then combine them with original features for increasing the receptive field and information of the feature map. Furthermore, based on the structure of FPN (Feature Pyramid Network), we design a novel Cross-Fusion Feature Pyramid module (CF module), which combines top-down and bottom-up connections to fuse features across scales, and achieves high-level semantic feature map at all scales. Extensive experiments on PASCAL VOC 2007 and 2012 demonstrate that FC-CF Net achieves state-of-the-art detection accuracy (i.e. 82.4% mAP, 80.5% mAP) with high efficiency (i.e. 69 FPS, 35 FPS).",

keywords = "Cross-fusion, Feature Combination, Feature pyramid, Receptive fields",

author = "Yongqiang Zhao and Yuan Rao and Shipeng Dong and Jiangnan Qi",

note = "Publisher Copyright: {\textcopyright} 2019, Springer Nature Switzerland AG.; 26th International Conference on Neural Information Processing, ICONIP 2019 ; Conference date: 12-12-2019 Through 15-12-2019",

year = "2019",

doi = "10.1007/978-3-030-36711-4_4",

language = "英语",

isbn = "9783030367107",

series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

publisher = "Springer",

pages = "37--49",

editor = "Tom Gedeon and Wong, {Kok Wai} and Minho Lee",

booktitle = "Neural Information Processing - 26th International Conference, ICONIP 2019, Proceedings",

}

Zhao, Y, Rao, Y, Dong, S & Qi, J 2019, Feature Combination Based on Receptive Fields and Cross-Fusion Feature Pyramid for Object Detection. 在 T Gedeon, KW Wong & M Lee (编辑), Neural Information Processing - 26th International Conference, ICONIP 2019, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 卷 11954 LNCS, Springer, 页码 37-49, 26th International Conference on Neural Information Processing, ICONIP 2019, Sydney, 澳大利亚, 12/12/19. https://doi.org/10.1007/978-3-030-36711-4_4

Feature Combination Based on Receptive Fields and Cross-Fusion Feature Pyramid for Object Detection. / Zhao, Yongqiang; Rao, Yuan; Dong, Shipeng 等.
Neural Information Processing - 26th International Conference, ICONIP 2019, Proceedings. 编辑 / Tom Gedeon; Kok Wai Wong; Minho Lee. Springer, 2019. 页码 37-49 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); 卷 11954 LNCS).

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

TY - GEN

T1 - Feature Combination Based on Receptive Fields and Cross-Fusion Feature Pyramid for Object Detection

AU - Zhao, Yongqiang

AU - Rao, Yuan

AU - Dong, Shipeng

AU - Qi, Jiangnan

PY - 2019

Y1 - 2019

N2 - Currently, the state-of-the-art method about object detector in image mainly depends on deep backbones, such as ResNet-50, DarkNet-53, ResNet-101 and DenseNet-169, which benefits for their powerful capability of feature representations but suffers from high computational cost. On the basis of fast lightweight backbone network (i.e., VGG-16), this paper improves the capability of feature representations by combining features of different receptive fields and cross-fusing feature pyramids, and finally establishes a fast and accurate detector. The architecture of our model is designed to integrate FC-CF Net with two sub-modules: FC module and CF module. Inspired by the structure of receptive fields in visual systems of human, we propose a novel method about Feature Combination Based on Receptive Fields module (FC module), which takes the relationship between the size and eccentricity of receptive fields into account, and then combine them with original features for increasing the receptive field and information of the feature map. Furthermore, based on the structure of FPN (Feature Pyramid Network), we design a novel Cross-Fusion Feature Pyramid module (CF module), which combines top-down and bottom-up connections to fuse features across scales, and achieves high-level semantic feature map at all scales. Extensive experiments on PASCAL VOC 2007 and 2012 demonstrate that FC-CF Net achieves state-of-the-art detection accuracy (i.e. 82.4% mAP, 80.5% mAP) with high efficiency (i.e. 69 FPS, 35 FPS).

AB - Currently, the state-of-the-art method about object detector in image mainly depends on deep backbones, such as ResNet-50, DarkNet-53, ResNet-101 and DenseNet-169, which benefits for their powerful capability of feature representations but suffers from high computational cost. On the basis of fast lightweight backbone network (i.e., VGG-16), this paper improves the capability of feature representations by combining features of different receptive fields and cross-fusing feature pyramids, and finally establishes a fast and accurate detector. The architecture of our model is designed to integrate FC-CF Net with two sub-modules: FC module and CF module. Inspired by the structure of receptive fields in visual systems of human, we propose a novel method about Feature Combination Based on Receptive Fields module (FC module), which takes the relationship between the size and eccentricity of receptive fields into account, and then combine them with original features for increasing the receptive field and information of the feature map. Furthermore, based on the structure of FPN (Feature Pyramid Network), we design a novel Cross-Fusion Feature Pyramid module (CF module), which combines top-down and bottom-up connections to fuse features across scales, and achieves high-level semantic feature map at all scales. Extensive experiments on PASCAL VOC 2007 and 2012 demonstrate that FC-CF Net achieves state-of-the-art detection accuracy (i.e. 82.4% mAP, 80.5% mAP) with high efficiency (i.e. 69 FPS, 35 FPS).

KW - Cross-fusion

KW - Feature Combination

KW - Feature pyramid

KW - Receptive fields

UR - http://www.scopus.com/inward/record.url?scp=85076882572&partnerID=8YFLogxK

U2 - 10.1007/978-3-030-36711-4_4

DO - 10.1007/978-3-030-36711-4_4

M3 - 会议稿件

AN - SCOPUS:85076882572

SN - 9783030367107

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 37

EP - 49

BT - Neural Information Processing - 26th International Conference, ICONIP 2019, Proceedings

A2 - Gedeon, Tom

A2 - Wong, Kok Wai

A2 - Lee, Minho

PB - Springer

T2 - 26th International Conference on Neural Information Processing, ICONIP 2019

Y2 - 12 December 2019 through 15 December 2019

ER -

Zhao Y, Rao Y, Dong S, Qi J. Feature Combination Based on Receptive Fields and Cross-Fusion Feature Pyramid for Object Detection. 在 Gedeon T, Wong KW, Lee M, 编辑, Neural Information Processing - 26th International Conference, ICONIP 2019, Proceedings. Springer. 2019. 页码 37-49. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). doi: 10.1007/978-3-030-36711-4_4

Feature Combination Based on Receptive Fields and Cross-Fusion Feature Pyramid for Object Detection

摘要

出版系列

会议

访问文件

其它文件与链接

指纹

引用此