TY - JOUR
T1 - Zoom Text Detector
AU - Yang, Chuang
AU - Chen, Mulin
AU - Yuan, Yuan
AU - Wang, Qi
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2024
Y1 - 2024
N2 - To pursue comprehensive performance, recent text detectors improve detection speed at the expense of accuracy. They adopt shrink-mask-based text representation strategies, which leads to a high dependence of detection accuracy on shrink-masks. Unfortunately, three disadvantages cause unreliable shrink-masks. Specifically, these methods try to strengthen the discrimination of shrink-masks from the background by semantic information. However, the feature defocusing phenomenon that coarse layers are optimized by fine-grained objectives limits the extraction of semantic features. Meanwhile, since both shrink-masks and the margins belong to texts, the detail loss phenomenon that the margins are ignored hinders the distinguishment of shrink-masks from the margins, which causes ambiguous shrink-mask edges. Moreover, false-positive samples enjoy similar visual features with shrink-masks. They aggravate the decline of shrink-masks recognition. To avoid the above problems, we propose a zoom text detector (ZTD) inspired by the zoom process of the camera. Specifically, zoomed-out view module (ZOM) is introduced to provide coarse-grained optimization objectives for coarse layers to avoid feature defocusing. Meanwhile, zoomed-in view module (ZIM) is presented to enhance the margins recognition to prevent detail loss. Furthermore, sequential-visual discriminator (SVD) is designed to suppress false-positive samples by sequential and visual features. Experiments verify the superior comprehensive performance of ZTD.
AB - To pursue comprehensive performance, recent text detectors improve detection speed at the expense of accuracy. They adopt shrink-mask-based text representation strategies, which leads to a high dependence of detection accuracy on shrink-masks. Unfortunately, three disadvantages cause unreliable shrink-masks. Specifically, these methods try to strengthen the discrimination of shrink-masks from the background by semantic information. However, the feature defocusing phenomenon that coarse layers are optimized by fine-grained objectives limits the extraction of semantic features. Meanwhile, since both shrink-masks and the margins belong to texts, the detail loss phenomenon that the margins are ignored hinders the distinguishment of shrink-masks from the margins, which causes ambiguous shrink-mask edges. Moreover, false-positive samples enjoy similar visual features with shrink-masks. They aggravate the decline of shrink-masks recognition. To avoid the above problems, we propose a zoom text detector (ZTD) inspired by the zoom process of the camera. Specifically, zoomed-out view module (ZOM) is introduced to provide coarse-grained optimization objectives for coarse layers to avoid feature defocusing. Meanwhile, zoomed-in view module (ZIM) is presented to enhance the margins recognition to prevent detail loss. Furthermore, sequential-visual discriminator (SVD) is designed to suppress false-positive samples by sequential and visual features. Experiments verify the superior comprehensive performance of ZTD.
KW - Detail loss
KW - false-positive samples
KW - feature defocusing
KW - text detection
KW - zoom strategy
UR - http://www.scopus.com/inward/record.url?scp=85164436464&partnerID=8YFLogxK
U2 - 10.1109/TNNLS.2023.3289327
DO - 10.1109/TNNLS.2023.3289327
M3 - 文章
C2 - 37402201
AN - SCOPUS:85164436464
SN - 2162-237X
VL - 35
SP - 15745
EP - 15757
JO - IEEE Transactions on Neural Networks and Learning Systems
JF - IEEE Transactions on Neural Networks and Learning Systems
IS - 11
ER -