Zoom Text Detector

Chuang Yang; Mulin Chen; Yuan Yuan; Qi Wang

doi:10.1109/TNNLS.2023.3289327

Zoom Text Detector

Chuang Yang, Mulin Chen, Yuan Yuan, Qi Wang

School of Artificial Intelligence, OPtics and Electronics

Northwestern Polytechnical University Xian

Research output: Contribution to journal › Article › peer-review

7 Scopus citations

Abstract

To pursue comprehensive performance, recent text detectors improve detection speed at the expense of accuracy. They adopt shrink-mask-based text representation strategies, which leads to a high dependence of detection accuracy on shrink-masks. Unfortunately, three disadvantages cause unreliable shrink-masks. Specifically, these methods try to strengthen the discrimination of shrink-masks from the background by semantic information. However, the feature defocusing phenomenon that coarse layers are optimized by fine-grained objectives limits the extraction of semantic features. Meanwhile, since both shrink-masks and the margins belong to texts, the detail loss phenomenon that the margins are ignored hinders the distinguishment of shrink-masks from the margins, which causes ambiguous shrink-mask edges. Moreover, false-positive samples enjoy similar visual features with shrink-masks. They aggravate the decline of shrink-masks recognition. To avoid the above problems, we propose a zoom text detector (ZTD) inspired by the zoom process of the camera. Specifically, zoomed-out view module (ZOM) is introduced to provide coarse-grained optimization objectives for coarse layers to avoid feature defocusing. Meanwhile, zoomed-in view module (ZIM) is presented to enhance the margins recognition to prevent detail loss. Furthermore, sequential-visual discriminator (SVD) is designed to suppress false-positive samples by sequential and visual features. Experiments verify the superior comprehensive performance of ZTD.

Original language	English
Pages (from-to)	15745-15757
Number of pages	13
Journal	IEEE Transactions on Neural Networks and Learning Systems
Volume	35
Issue number	11
DOIs	https://doi.org/10.1109/TNNLS.2023.3289327
State	Published - 2024

Keywords

Detail loss
false-positive samples
feature defocusing
text detection
zoom strategy

Access to Document

10.1109/TNNLS.2023.3289327

Cite this

@article{24c7eef38a4d4138ba940cb0a6f0b420,

title = "Zoom Text Detector",

abstract = "To pursue comprehensive performance, recent text detectors improve detection speed at the expense of accuracy. They adopt shrink-mask-based text representation strategies, which leads to a high dependence of detection accuracy on shrink-masks. Unfortunately, three disadvantages cause unreliable shrink-masks. Specifically, these methods try to strengthen the discrimination of shrink-masks from the background by semantic information. However, the feature defocusing phenomenon that coarse layers are optimized by fine-grained objectives limits the extraction of semantic features. Meanwhile, since both shrink-masks and the margins belong to texts, the detail loss phenomenon that the margins are ignored hinders the distinguishment of shrink-masks from the margins, which causes ambiguous shrink-mask edges. Moreover, false-positive samples enjoy similar visual features with shrink-masks. They aggravate the decline of shrink-masks recognition. To avoid the above problems, we propose a zoom text detector (ZTD) inspired by the zoom process of the camera. Specifically, zoomed-out view module (ZOM) is introduced to provide coarse-grained optimization objectives for coarse layers to avoid feature defocusing. Meanwhile, zoomed-in view module (ZIM) is presented to enhance the margins recognition to prevent detail loss. Furthermore, sequential-visual discriminator (SVD) is designed to suppress false-positive samples by sequential and visual features. Experiments verify the superior comprehensive performance of ZTD.",

keywords = "Detail loss, false-positive samples, feature defocusing, text detection, zoom strategy",

author = "Chuang Yang and Mulin Chen and Yuan Yuan and Qi Wang",

note = "Publisher Copyright: {\textcopyright} 2023 IEEE.",

year = "2024",

doi = "10.1109/TNNLS.2023.3289327",

language = "英语",

volume = "35",

pages = "15745--15757",

journal = "IEEE Transactions on Neural Networks and Learning Systems",

issn = "2162-237X",

publisher = "IEEE Computational Intelligence Society",

number = "11",

}

TY - JOUR

T1 - Zoom Text Detector

AU - Yang, Chuang

AU - Chen, Mulin

AU - Yuan, Yuan

AU - Wang, Qi

PY - 2024

Y1 - 2024

N2 - To pursue comprehensive performance, recent text detectors improve detection speed at the expense of accuracy. They adopt shrink-mask-based text representation strategies, which leads to a high dependence of detection accuracy on shrink-masks. Unfortunately, three disadvantages cause unreliable shrink-masks. Specifically, these methods try to strengthen the discrimination of shrink-masks from the background by semantic information. However, the feature defocusing phenomenon that coarse layers are optimized by fine-grained objectives limits the extraction of semantic features. Meanwhile, since both shrink-masks and the margins belong to texts, the detail loss phenomenon that the margins are ignored hinders the distinguishment of shrink-masks from the margins, which causes ambiguous shrink-mask edges. Moreover, false-positive samples enjoy similar visual features with shrink-masks. They aggravate the decline of shrink-masks recognition. To avoid the above problems, we propose a zoom text detector (ZTD) inspired by the zoom process of the camera. Specifically, zoomed-out view module (ZOM) is introduced to provide coarse-grained optimization objectives for coarse layers to avoid feature defocusing. Meanwhile, zoomed-in view module (ZIM) is presented to enhance the margins recognition to prevent detail loss. Furthermore, sequential-visual discriminator (SVD) is designed to suppress false-positive samples by sequential and visual features. Experiments verify the superior comprehensive performance of ZTD.

AB - To pursue comprehensive performance, recent text detectors improve detection speed at the expense of accuracy. They adopt shrink-mask-based text representation strategies, which leads to a high dependence of detection accuracy on shrink-masks. Unfortunately, three disadvantages cause unreliable shrink-masks. Specifically, these methods try to strengthen the discrimination of shrink-masks from the background by semantic information. However, the feature defocusing phenomenon that coarse layers are optimized by fine-grained objectives limits the extraction of semantic features. Meanwhile, since both shrink-masks and the margins belong to texts, the detail loss phenomenon that the margins are ignored hinders the distinguishment of shrink-masks from the margins, which causes ambiguous shrink-mask edges. Moreover, false-positive samples enjoy similar visual features with shrink-masks. They aggravate the decline of shrink-masks recognition. To avoid the above problems, we propose a zoom text detector (ZTD) inspired by the zoom process of the camera. Specifically, zoomed-out view module (ZOM) is introduced to provide coarse-grained optimization objectives for coarse layers to avoid feature defocusing. Meanwhile, zoomed-in view module (ZIM) is presented to enhance the margins recognition to prevent detail loss. Furthermore, sequential-visual discriminator (SVD) is designed to suppress false-positive samples by sequential and visual features. Experiments verify the superior comprehensive performance of ZTD.

KW - Detail loss

KW - false-positive samples

KW - feature defocusing

KW - text detection

KW - zoom strategy

UR - http://www.scopus.com/inward/record.url?scp=85164436464&partnerID=8YFLogxK

U2 - 10.1109/TNNLS.2023.3289327

DO - 10.1109/TNNLS.2023.3289327

M3 - 文章

C2 - 37402201

AN - SCOPUS:85164436464

SN - 2162-237X

VL - 35

SP - 15745

EP - 15757

JO - IEEE Transactions on Neural Networks and Learning Systems

JF - IEEE Transactions on Neural Networks and Learning Systems

IS - 11

ER -

Zoom Text Detector

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this