Balancing Optimization Strategies and Practical Goals: An Efficient Scene Text Detector

Research output: Contribution to journalArticlepeer-review

Abstract

Scene text reading is a crucial task for scene understanding. Text detection, as a fundamental task in scene text reading, has recently garnered significant attention. Among various approaches, segmentation-based methods stand out for their flexible pixel-level prediction capabilities. However, two main issues remain. 1) These methods treat all text instances as a pixel set during training, causing the features of large-scale instances to dominate the model optimization process. As a result, the optimization deviates from the instance-level objectives. 2) Segmentation methods filter candidates based on pixel-level class scores, whereas what is needed is an evaluation of whether an instance is text, which also deviates from the original goals. To address these issues, we propose an Instance-Equal Feature Guide Module (IEFGM), a Cross-Level Feature Interaction Module (CLIFM), and a Pixel-Instance Fusion Discriminator (PIFD) to balance optimization strategies with practical goals. The IEFGM introduces instance-level features and positional information, guiding the model to treat instances of different scales equally at the feature level. The CLIFM encourages feature interaction across different levels, enabling the model to recognize text from various perspectives. Unlike existing methods that filter candidates using pixel-level results, the PIFD integrates both instance-level and pixel-level information to identify candidate regions, aligning with the original goals of text detection. A series of ablation studies demonstrates the effectiveness of the proposed modules. Extensive experiments across six datasets from different scenes demonstrate that our method outperforms existing state-of-the-art approaches.

Original languageEnglish
Pages (from-to)426-438
Number of pages13
JournalIEEE Transactions on Multimedia
Volume28
DOIs
StatePublished - 2026

Keywords

  • Object detection
  • multi-scene
  • semantic segmentation
  • text detection

Fingerprint

Dive into the research topics of 'Balancing Optimization Strategies and Practical Goals: An Efficient Scene Text Detector'. Together they form a unique fingerprint.

Cite this