TY - JOUR
T1 - Pull Pole Points to Text Contour by Magnetism
T2 - A Real-Time Scene Text Detector
AU - Han, Xu
AU - Yang, Chuang
AU - Wang, Qi
N1 - Publisher Copyright:
© 1992-2012 IEEE.
PY - 2025
Y1 - 2025
N2 - Scene text reading plays a crucial role in scene understanding. As its precondition task, scene text detection has garnered increasing interest from researchers. Segmentation-based text detection methods have gained prominence due to their adaptable pixel-level predictions. Many existing methods predict the shrink mask and utilize the Vatti clipping algorithm to reconstruct text contours. However, the shrink mask only focuses on the global geometry feature and shrinks the same distance everywhere, which neglects local contour information and disrupts the instance shape feature. In addition, the post-processing based on the Vatti clipping algorithm heavily relies on the predictions and is relatively complex, causing suboptimal performance in both detection accuracy and efficiency. To address the above problems, we propose an efficient and effective method named Magnetic Text Detector (MTD), inspired by magnetism. It is constructed by a text representation method flexible mask (FM) and a magnetic pull module (MPM). Unlike the shrink mask and concentric mask, the former concerns the local contours and shrinks unfixed distances on different positions, which avoids the truncation issue while preserving distinctiveness from the text regions. The latter generates magnetic fields and pulls pole points of FM to the text contour by magnetism. This allows accurate reconstruction of text contours, even when predictions deviate from the actual text severely, while saving 50% of the post-processing time approximately. Several ablation studies verify the effectiveness of the proposed FM and MPM. Extensive experiments show that our MTD achieves state-of-the-art (SOTA) methods on multiple datasets from different scenes.
AB - Scene text reading plays a crucial role in scene understanding. As its precondition task, scene text detection has garnered increasing interest from researchers. Segmentation-based text detection methods have gained prominence due to their adaptable pixel-level predictions. Many existing methods predict the shrink mask and utilize the Vatti clipping algorithm to reconstruct text contours. However, the shrink mask only focuses on the global geometry feature and shrinks the same distance everywhere, which neglects local contour information and disrupts the instance shape feature. In addition, the post-processing based on the Vatti clipping algorithm heavily relies on the predictions and is relatively complex, causing suboptimal performance in both detection accuracy and efficiency. To address the above problems, we propose an efficient and effective method named Magnetic Text Detector (MTD), inspired by magnetism. It is constructed by a text representation method flexible mask (FM) and a magnetic pull module (MPM). Unlike the shrink mask and concentric mask, the former concerns the local contours and shrinks unfixed distances on different positions, which avoids the truncation issue while preserving distinctiveness from the text regions. The latter generates magnetic fields and pulls pole points of FM to the text contour by magnetism. This allows accurate reconstruction of text contours, even when predictions deviate from the actual text severely, while saving 50% of the post-processing time approximately. Several ablation studies verify the effectiveness of the proposed FM and MPM. Extensive experiments show that our MTD achieves state-of-the-art (SOTA) methods on multiple datasets from different scenes.
KW - Real-time
KW - magnetic
KW - multi-scene
KW - text detection
UR - https://www.scopus.com/pages/publications/105016639892
U2 - 10.1109/TIP.2025.3609196
DO - 10.1109/TIP.2025.3609196
M3 - 文章
AN - SCOPUS:105016639892
SN - 1057-7149
VL - 34
SP - 6374
EP - 6385
JO - IEEE Transactions on Image Processing
JF - IEEE Transactions on Image Processing
ER -