TY - GEN
T1 - Optimal Kernel for Real-Time Arbitrary-Shaped Text Detection
AU - Ma, Haozhao
AU - Yang, Chuang
AU - Yuan, Yuan
AU - Wang, Qi
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2023
Y1 - 2023
N2 - Recently, segmentation-based text detection methods develop rapidly, which achieve competitive accuracy and detection speed. However, these methods are hard to fit text instances accurately, which leads to the decrease of model performance. Meanwhile, the poor perception of the text center by the boundary pixels further affects the detection accuracy. We follow the issues and design an efficient framework for arbitrary-shaped text detection, which is constructed based on Optimal Kernel Representation (OKR) and Pixel Enhancement Module (PEM). Specifically, OKR is proposed to fit texts with optimal kernels. It erodes texts according to the corresponding geometric characteristics, which is simpler and more accurate compared with previous methods. PEM is used to enhance the perception of boundary pixels to the virtual character centers of text, thus improving the cohesion of the whole instance. Particularly, PEM only participates in the training process, which brings no extra computation costs to inference. Ablation experiments show the effectiveness of OKR and PEM. Comparisons on serveral benchmarks verify that our efficient detector is superior to the existing state-of-the-art (SOTA) methods.
AB - Recently, segmentation-based text detection methods develop rapidly, which achieve competitive accuracy and detection speed. However, these methods are hard to fit text instances accurately, which leads to the decrease of model performance. Meanwhile, the poor perception of the text center by the boundary pixels further affects the detection accuracy. We follow the issues and design an efficient framework for arbitrary-shaped text detection, which is constructed based on Optimal Kernel Representation (OKR) and Pixel Enhancement Module (PEM). Specifically, OKR is proposed to fit texts with optimal kernels. It erodes texts according to the corresponding geometric characteristics, which is simpler and more accurate compared with previous methods. PEM is used to enhance the perception of boundary pixels to the virtual character centers of text, thus improving the cohesion of the whole instance. Particularly, PEM only participates in the training process, which brings no extra computation costs to inference. Ablation experiments show the effectiveness of OKR and PEM. Comparisons on serveral benchmarks verify that our efficient detector is superior to the existing state-of-the-art (SOTA) methods.
KW - Efficient text detector
KW - optimal kernel
KW - pixel enhancement
UR - http://www.scopus.com/inward/record.url?scp=85177572741&partnerID=8YFLogxK
U2 - 10.1109/ICASSP49357.2023.10094734
DO - 10.1109/ICASSP49357.2023.10094734
M3 - 会议稿件
AN - SCOPUS:85177572741
T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
BT - ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing, Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 48th IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2023
Y2 - 4 June 2023 through 10 June 2023
ER -