Skip to main navigation Skip to search Skip to main content

Compensating for the Incomplete With the Complete: An Efficient Scene Text Detector

  • Northwestern Polytechnical University Xian

Research output: Contribution to journalArticlepeer-review

Abstract

Scene text reading is an essential component of scene understanding. As its fundamental requirement, text detection has garnered increasing attention. Segmenting the text kernel and extending it to reconstruct text instances is efficient and effective among the various methods. However, the incomplete semantic features of text kernels and the high similarity between kernels and texts make it hard to extract kernels from images accurately. Considering the above, we propose an efficient text detector, termed CIC, which comprises a bidirectional information transfer module (BITM), a dual knowledge integration module (DKIM), and a cross-verification module (CVM). The former generates collaborative information between the predicted text and kernel via the proposed differentiable adaptive gap operator. It forces mutual restraint and collaborative progress between the predictions of text and kernel. Unlike BITM, DKIM designs a knowledge fuse scheme, which helps to locate kernels accurately under the guidance of the complete semantic feature of texts. Intuitively, as the kernel is generated by shrinking the text, the kernel pixel is only presented in the text area. Based on this criterion, the CVM further utilizes text predictions to constrain kernel predictions and reduce false positive predictions. Ablation experiments demonstrate the effectiveness of the proposed BITM, DKIM, and CVM. Extensive experiments show the proposed CIC outperforms existing state-of-the-art (SOTA) methods on five public datasets from different scenes.

Original languageEnglish
Pages (from-to)12096-12108
Number of pages13
JournalIEEE Transactions on Circuits and Systems for Video Technology
Volume35
Issue number12
DOIs
StatePublished - 2025

Keywords

  • Real-time
  • multi-scene
  • semantic segmentation
  • text detection

Fingerprint

Dive into the research topics of 'Compensating for the Incomplete With the Complete: An Efficient Scene Text Detector'. Together they form a unique fingerprint.

Cite this