TY - GEN
T1 - Feature Combination Based on Receptive Fields and Cross-Fusion Feature Pyramid for Object Detection
AU - Zhao, Yongqiang
AU - Rao, Yuan
AU - Dong, Shipeng
AU - Qi, Jiangnan
N1 - Publisher Copyright:
© 2019, Springer Nature Switzerland AG.
PY - 2019
Y1 - 2019
N2 - Currently, the state-of-the-art method about object detector in image mainly depends on deep backbones, such as ResNet-50, DarkNet-53, ResNet-101 and DenseNet-169, which benefits for their powerful capability of feature representations but suffers from high computational cost. On the basis of fast lightweight backbone network (i.e., VGG-16), this paper improves the capability of feature representations by combining features of different receptive fields and cross-fusing feature pyramids, and finally establishes a fast and accurate detector. The architecture of our model is designed to integrate FC-CF Net with two sub-modules: FC module and CF module. Inspired by the structure of receptive fields in visual systems of human, we propose a novel method about Feature Combination Based on Receptive Fields module (FC module), which takes the relationship between the size and eccentricity of receptive fields into account, and then combine them with original features for increasing the receptive field and information of the feature map. Furthermore, based on the structure of FPN (Feature Pyramid Network), we design a novel Cross-Fusion Feature Pyramid module (CF module), which combines top-down and bottom-up connections to fuse features across scales, and achieves high-level semantic feature map at all scales. Extensive experiments on PASCAL VOC 2007 and 2012 demonstrate that FC-CF Net achieves state-of-the-art detection accuracy (i.e. 82.4% mAP, 80.5% mAP) with high efficiency (i.e. 69 FPS, 35 FPS).
AB - Currently, the state-of-the-art method about object detector in image mainly depends on deep backbones, such as ResNet-50, DarkNet-53, ResNet-101 and DenseNet-169, which benefits for their powerful capability of feature representations but suffers from high computational cost. On the basis of fast lightweight backbone network (i.e., VGG-16), this paper improves the capability of feature representations by combining features of different receptive fields and cross-fusing feature pyramids, and finally establishes a fast and accurate detector. The architecture of our model is designed to integrate FC-CF Net with two sub-modules: FC module and CF module. Inspired by the structure of receptive fields in visual systems of human, we propose a novel method about Feature Combination Based on Receptive Fields module (FC module), which takes the relationship between the size and eccentricity of receptive fields into account, and then combine them with original features for increasing the receptive field and information of the feature map. Furthermore, based on the structure of FPN (Feature Pyramid Network), we design a novel Cross-Fusion Feature Pyramid module (CF module), which combines top-down and bottom-up connections to fuse features across scales, and achieves high-level semantic feature map at all scales. Extensive experiments on PASCAL VOC 2007 and 2012 demonstrate that FC-CF Net achieves state-of-the-art detection accuracy (i.e. 82.4% mAP, 80.5% mAP) with high efficiency (i.e. 69 FPS, 35 FPS).
KW - Cross-fusion
KW - Feature Combination
KW - Feature pyramid
KW - Receptive fields
UR - http://www.scopus.com/inward/record.url?scp=85076882572&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-36711-4_4
DO - 10.1007/978-3-030-36711-4_4
M3 - 会议稿件
AN - SCOPUS:85076882572
SN - 9783030367107
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 37
EP - 49
BT - Neural Information Processing - 26th International Conference, ICONIP 2019, Proceedings
A2 - Gedeon, Tom
A2 - Wong, Kok Wai
A2 - Lee, Minho
PB - Springer
T2 - 26th International Conference on Neural Information Processing, ICONIP 2019
Y2 - 12 December 2019 through 15 December 2019
ER -