Spotlight Text Detector: Spotlight on Candidate Regions Like a Camera

Xu Han; Junyu Gao; Chuang Yang; Yuan Yuan; Qi Wang

doi:10.1109/TMM.2024.3521824

Spotlight Text Detector: Spotlight on Candidate Regions Like a Camera

Xu Han, Junyu Gao, Chuang Yang, Yuan Yuan, Qi Wang

光电与智能研究院

Northwestern Polytechnical University Xian

科研成果: 期刊稿件 › 文章 › 同行评审

摘要

The irregular contour representation is one of the tough challenges in scene text detection. Although segmentation-based methods have achieved significant progress with the help of flexible pixel prediction, the overlap of geographically close texts hinders detecting them separately. To alleviate this problem, some shrink-based methods predict text kernels and expand them to restructure texts. However, the text kernel is an artificial object with incomplete semantic features that are prone to incorrect or missing detection. In addition, different from the general objects, the geometry features (aspect ratio, scale, and shape) of scene texts vary significantly, which makes it difficult to detect them accurately. To consider the above problems, we propose an effective spotlight text detector (STD), which consists of a spotlight calibration module (SCM) and a multivariate information extraction module (MIEM). The former concentrates efforts on the candidate kernel, like a camera focus on the target. It obtains candidate features through a mapping filter and calibrates them precisely to eliminate some false positive samples. The latter designs different shape schemes to explore multiple geometric features for scene texts. It helps extract various spatial relationships to improve the model's ability to recognize kernel regions. Ablation studies prove the effectiveness of the designed SCM and MIEM. Extensive experiments verify that our STD is superior to existing state-of-the-art methods on various datasets, including ICDAR2015, CTW1500, MSRA-TD500, and Total-Text.

源语言	英语
页（从-至）	1937-1949
页数	13
期刊	IEEE Transactions on Multimedia
卷	27
DOI	https://doi.org/10.1109/TMM.2024.3521824
出版状态	已出版 - 2025

访问文件

10.1109/TMM.2024.3521824

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{d0643be7d0064371a2fec66684002cb2,

title = "Spotlight Text Detector: Spotlight on Candidate Regions Like a Camera",

abstract = "The irregular contour representation is one of the tough challenges in scene text detection. Although segmentation-based methods have achieved significant progress with the help of flexible pixel prediction, the overlap of geographically close texts hinders detecting them separately. To alleviate this problem, some shrink-based methods predict text kernels and expand them to restructure texts. However, the text kernel is an artificial object with incomplete semantic features that are prone to incorrect or missing detection. In addition, different from the general objects, the geometry features (aspect ratio, scale, and shape) of scene texts vary significantly, which makes it difficult to detect them accurately. To consider the above problems, we propose an effective spotlight text detector (STD), which consists of a spotlight calibration module (SCM) and a multivariate information extraction module (MIEM). The former concentrates efforts on the candidate kernel, like a camera focus on the target. It obtains candidate features through a mapping filter and calibrates them precisely to eliminate some false positive samples. The latter designs different shape schemes to explore multiple geometric features for scene texts. It helps extract various spatial relationships to improve the model's ability to recognize kernel regions. Ablation studies prove the effectiveness of the designed SCM and MIEM. Extensive experiments verify that our STD is superior to existing state-of-the-art methods on various datasets, including ICDAR2015, CTW1500, MSRA-TD500, and Total-Text.",

keywords = "Text detection, arbitrary-shaped text, segmentation refinement",

author = "Xu Han and Junyu Gao and Chuang Yang and Yuan Yuan and Qi Wang",

note = "Publisher Copyright: {\textcopyright} 1999-2012 IEEE.",

year = "2025",

doi = "10.1109/TMM.2024.3521824",

language = "英语",

volume = "27",

pages = "1937--1949",

journal = "IEEE Transactions on Multimedia",

issn = "1520-9210",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - JOUR

T1 - Spotlight Text Detector

T2 - Spotlight on Candidate Regions Like a Camera

AU - Han, Xu

AU - Gao, Junyu

AU - Yang, Chuang

AU - Yuan, Yuan

AU - Wang, Qi

PY - 2025

Y1 - 2025

N2 - The irregular contour representation is one of the tough challenges in scene text detection. Although segmentation-based methods have achieved significant progress with the help of flexible pixel prediction, the overlap of geographically close texts hinders detecting them separately. To alleviate this problem, some shrink-based methods predict text kernels and expand them to restructure texts. However, the text kernel is an artificial object with incomplete semantic features that are prone to incorrect or missing detection. In addition, different from the general objects, the geometry features (aspect ratio, scale, and shape) of scene texts vary significantly, which makes it difficult to detect them accurately. To consider the above problems, we propose an effective spotlight text detector (STD), which consists of a spotlight calibration module (SCM) and a multivariate information extraction module (MIEM). The former concentrates efforts on the candidate kernel, like a camera focus on the target. It obtains candidate features through a mapping filter and calibrates them precisely to eliminate some false positive samples. The latter designs different shape schemes to explore multiple geometric features for scene texts. It helps extract various spatial relationships to improve the model's ability to recognize kernel regions. Ablation studies prove the effectiveness of the designed SCM and MIEM. Extensive experiments verify that our STD is superior to existing state-of-the-art methods on various datasets, including ICDAR2015, CTW1500, MSRA-TD500, and Total-Text.

AB - The irregular contour representation is one of the tough challenges in scene text detection. Although segmentation-based methods have achieved significant progress with the help of flexible pixel prediction, the overlap of geographically close texts hinders detecting them separately. To alleviate this problem, some shrink-based methods predict text kernels and expand them to restructure texts. However, the text kernel is an artificial object with incomplete semantic features that are prone to incorrect or missing detection. In addition, different from the general objects, the geometry features (aspect ratio, scale, and shape) of scene texts vary significantly, which makes it difficult to detect them accurately. To consider the above problems, we propose an effective spotlight text detector (STD), which consists of a spotlight calibration module (SCM) and a multivariate information extraction module (MIEM). The former concentrates efforts on the candidate kernel, like a camera focus on the target. It obtains candidate features through a mapping filter and calibrates them precisely to eliminate some false positive samples. The latter designs different shape schemes to explore multiple geometric features for scene texts. It helps extract various spatial relationships to improve the model's ability to recognize kernel regions. Ablation studies prove the effectiveness of the designed SCM and MIEM. Extensive experiments verify that our STD is superior to existing state-of-the-art methods on various datasets, including ICDAR2015, CTW1500, MSRA-TD500, and Total-Text.

KW - Text detection

KW - arbitrary-shaped text

KW - segmentation refinement

UR - http://www.scopus.com/inward/record.url?scp=85213470349&partnerID=8YFLogxK

U2 - 10.1109/TMM.2024.3521824

DO - 10.1109/TMM.2024.3521824

M3 - 文章

AN - SCOPUS:85213470349

SN - 1520-9210

VL - 27

SP - 1937

EP - 1949

JO - IEEE Transactions on Multimedia

JF - IEEE Transactions on Multimedia

ER -

Spotlight Text Detector: Spotlight on Candidate Regions Like a Camera

摘要

访问文件

其它文件与链接

指纹

引用此