Scale-Balanced Real-Time Object Detection With Varying Input-Image Resolution

Longbin Yan; Yunxiao Qin; Jie Chen

doi:10.1109/TCSVT.2022.3198329

Scale-Balanced Real-Time Object Detection With Varying Input-Image Resolution

Longbin Yan, Yunxiao Qin, Jie Chen

School of Marine Science and Technology

Research output: Contribution to journal › Article › peer-review

13 Scopus citations

Abstract

Current object-detection methods for small-scale objects are often marred by poor performance. Using relatively high-resolution input images can be considered a remedy for this issue, but it usually leads to performance degeneration for large-scale objects. We define this problem as the imbalance of detection performance for multi-scale objects when the resolution of input images varies. In addition, the use of high-resolution images results in significant computational resource consumption and inference-speed impairment. In this paper, we propose a friendly varying-resolution object-detection method for multi-scale objects. We analyze in detail the reasons leading to the performance degradation in the detection of large-scale objects with increasing input-image resolution, and propose a novel lightweight bidirectional feature-flow module to enhance the performance of multi-scale object detection in high-resolution images, especially for large-scale objects. The proposed approach can also ease the problems of computational resource consumption and inference-speed impairment caused by high-resolution images. Additionally, a decoupled detection head is designed to further improve performance by separating classification and regression sub-tasks, and an adaptive feature-fusion module is designed to better fuse different feature levels. The proposed scheme alleviates the negative effects of using high-resolution input images and achieves an excellent balance between inference speed and precision. Experiments on the MS COCO dataset show that the scheme achieves 44.6 AP at 42.6 FPS and 47 AP at 26.7 FPS, showing significant advantages over the methods to which it is compared.

Original language	English
Pages (from-to)	242-256
Number of pages	15
Journal	IEEE Transactions on Circuits and Systems for Video Technology
Volume	33
Issue number	1
DOIs	https://doi.org/10.1109/TCSVT.2022.3198329
State	Published - 1 Jan 2023

Keywords

Deep convolution neural network (CNN)
multi-scale features fusion
object detection

Access to Document

10.1109/TCSVT.2022.3198329

Cite this

@article{9df3b4b9471f46078cd484ef3f662266,

title = "Scale-Balanced Real-Time Object Detection With Varying Input-Image Resolution",

abstract = "Current object-detection methods for small-scale objects are often marred by poor performance. Using relatively high-resolution input images can be considered a remedy for this issue, but it usually leads to performance degeneration for large-scale objects. We define this problem as the imbalance of detection performance for multi-scale objects when the resolution of input images varies. In addition, the use of high-resolution images results in significant computational resource consumption and inference-speed impairment. In this paper, we propose a friendly varying-resolution object-detection method for multi-scale objects. We analyze in detail the reasons leading to the performance degradation in the detection of large-scale objects with increasing input-image resolution, and propose a novel lightweight bidirectional feature-flow module to enhance the performance of multi-scale object detection in high-resolution images, especially for large-scale objects. The proposed approach can also ease the problems of computational resource consumption and inference-speed impairment caused by high-resolution images. Additionally, a decoupled detection head is designed to further improve performance by separating classification and regression sub-tasks, and an adaptive feature-fusion module is designed to better fuse different feature levels. The proposed scheme alleviates the negative effects of using high-resolution input images and achieves an excellent balance between inference speed and precision. Experiments on the MS COCO dataset show that the scheme achieves 44.6 AP at 42.6 FPS and 47 AP at 26.7 FPS, showing significant advantages over the methods to which it is compared.",

keywords = "Deep convolution neural network (CNN), multi-scale features fusion, object detection",

author = "Longbin Yan and Yunxiao Qin and Jie Chen",

note = "Publisher Copyright: {\textcopyright} 1991-2012 IEEE.",

year = "2023",

month = jan,

day = "1",

doi = "10.1109/TCSVT.2022.3198329",

language = "英语",

volume = "33",

pages = "242--256",

journal = "IEEE Transactions on Circuits and Systems for Video Technology",

issn = "1051-8215",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

number = "1",

}

TY - JOUR

T1 - Scale-Balanced Real-Time Object Detection With Varying Input-Image Resolution

AU - Yan, Longbin

AU - Qin, Yunxiao

AU - Chen, Jie

PY - 2023/1/1

Y1 - 2023/1/1

N2 - Current object-detection methods for small-scale objects are often marred by poor performance. Using relatively high-resolution input images can be considered a remedy for this issue, but it usually leads to performance degeneration for large-scale objects. We define this problem as the imbalance of detection performance for multi-scale objects when the resolution of input images varies. In addition, the use of high-resolution images results in significant computational resource consumption and inference-speed impairment. In this paper, we propose a friendly varying-resolution object-detection method for multi-scale objects. We analyze in detail the reasons leading to the performance degradation in the detection of large-scale objects with increasing input-image resolution, and propose a novel lightweight bidirectional feature-flow module to enhance the performance of multi-scale object detection in high-resolution images, especially for large-scale objects. The proposed approach can also ease the problems of computational resource consumption and inference-speed impairment caused by high-resolution images. Additionally, a decoupled detection head is designed to further improve performance by separating classification and regression sub-tasks, and an adaptive feature-fusion module is designed to better fuse different feature levels. The proposed scheme alleviates the negative effects of using high-resolution input images and achieves an excellent balance between inference speed and precision. Experiments on the MS COCO dataset show that the scheme achieves 44.6 AP at 42.6 FPS and 47 AP at 26.7 FPS, showing significant advantages over the methods to which it is compared.

AB - Current object-detection methods for small-scale objects are often marred by poor performance. Using relatively high-resolution input images can be considered a remedy for this issue, but it usually leads to performance degeneration for large-scale objects. We define this problem as the imbalance of detection performance for multi-scale objects when the resolution of input images varies. In addition, the use of high-resolution images results in significant computational resource consumption and inference-speed impairment. In this paper, we propose a friendly varying-resolution object-detection method for multi-scale objects. We analyze in detail the reasons leading to the performance degradation in the detection of large-scale objects with increasing input-image resolution, and propose a novel lightweight bidirectional feature-flow module to enhance the performance of multi-scale object detection in high-resolution images, especially for large-scale objects. The proposed approach can also ease the problems of computational resource consumption and inference-speed impairment caused by high-resolution images. Additionally, a decoupled detection head is designed to further improve performance by separating classification and regression sub-tasks, and an adaptive feature-fusion module is designed to better fuse different feature levels. The proposed scheme alleviates the negative effects of using high-resolution input images and achieves an excellent balance between inference speed and precision. Experiments on the MS COCO dataset show that the scheme achieves 44.6 AP at 42.6 FPS and 47 AP at 26.7 FPS, showing significant advantages over the methods to which it is compared.

KW - Deep convolution neural network (CNN)

KW - multi-scale features fusion

KW - object detection

UR - http://www.scopus.com/inward/record.url?scp=85136858723&partnerID=8YFLogxK

U2 - 10.1109/TCSVT.2022.3198329

DO - 10.1109/TCSVT.2022.3198329

M3 - 文章

AN - SCOPUS:85136858723

SN - 1051-8215

VL - 33

SP - 242

EP - 256

JO - IEEE Transactions on Circuits and Systems for Video Technology

JF - IEEE Transactions on Circuits and Systems for Video Technology

IS - 1

ER -

Scale-Balanced Real-Time Object Detection With Varying Input-Image Resolution

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this