DPGS: Cross-cooperation guided dynamic points generation for scene text spotting

Wei Sun; Qianzhou Wang; Zhiqiang Hou; Xueling Chen; Qingsen Yan; Yanning Zhang

doi:10.1016/j.knosys.2024.112399

DPGS: Cross-cooperation guided dynamic points generation for scene text spotting

Wei Sun, Qianzhou Wang, Zhiqiang Hou, Xueling Chen, Qingsen Yan, Yanning Zhang

计算机学院

科研成果: 期刊稿件 › 文章 › 同行评审

1 引用（Scopus）

摘要

End-to-end text spotting aims to combine scene text detection and recognition into a unified framework. Dealing with the relationship between the two sub-tasks plays a pivotal role in designing effective spotters. While polygon or segmentation-based methods eliminate heuristic post-processing, they still face challenges such as background noise and high computational burden. In this study, we introduce DPGS, a coarse-to-fine learning framework that lets Dynamic Points Generation for text Spotting. DPGS simultaneously learns character representations for both detection and recognition tasks. Specifically, for each text instance, we represent the character sequence as ordered points and model them with learnable point queries. This approach progressively selects appropriate key points covering character and leverages group attention to associate similar information from different positions, improving detection accuracy. After passing through a single decoder, the point queries encode text semantics and locations, facilitating decoding to central line, boundary, script, and confidence of text through simple prediction heads. Additionally, we introduce an adaptive cooperative criterion to combine more useful feature knowledge, enhancing training efficiency. Extensive experiments show the superiority of our DPGS when handling scene text detection and recognition tasks. Compared to their respective top-1 methods, DPGS has significantly improved the average recognition accuracy by 3.7%, 1.9%, and 0.7% on the Total-Text, ICDAR15, and CTW1500 datasets, respectively.

源语言	英语
文章编号	112399
期刊	Knowledge-Based Systems
卷	302
DOI	https://doi.org/10.1016/j.knosys.2024.112399
出版状态	已出版 - 25 10月 2024

访问文件

10.1016/j.knosys.2024.112399

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{646546a61dce4749a307ce700d094faa,

title = "DPGS: Cross-cooperation guided dynamic points generation for scene text spotting",

abstract = "End-to-end text spotting aims to combine scene text detection and recognition into a unified framework. Dealing with the relationship between the two sub-tasks plays a pivotal role in designing effective spotters. While polygon or segmentation-based methods eliminate heuristic post-processing, they still face challenges such as background noise and high computational burden. In this study, we introduce DPGS, a coarse-to-fine learning framework that lets Dynamic Points Generation for text Spotting. DPGS simultaneously learns character representations for both detection and recognition tasks. Specifically, for each text instance, we represent the character sequence as ordered points and model them with learnable point queries. This approach progressively selects appropriate key points covering character and leverages group attention to associate similar information from different positions, improving detection accuracy. After passing through a single decoder, the point queries encode text semantics and locations, facilitating decoding to central line, boundary, script, and confidence of text through simple prediction heads. Additionally, we introduce an adaptive cooperative criterion to combine more useful feature knowledge, enhancing training efficiency. Extensive experiments show the superiority of our DPGS when handling scene text detection and recognition tasks. Compared to their respective top-1 methods, DPGS has significantly improved the average recognition accuracy by 3.7%, 1.9%, and 0.7% on the Total-Text, ICDAR15, and CTW1500 datasets, respectively.",

keywords = "Coarse-to-fine, Cross-cooperative learning, K-NN search, Scene text detection and recognition",

author = "Wei Sun and Qianzhou Wang and Zhiqiang Hou and Xueling Chen and Qingsen Yan and Yanning Zhang",

note = "Publisher Copyright: {\textcopyright} 2024 Elsevier B.V.",

year = "2024",

month = oct,

day = "25",

doi = "10.1016/j.knosys.2024.112399",

language = "英语",

volume = "302",

journal = "Knowledge-Based Systems",

issn = "0950-7051",

publisher = "Elsevier B.V.",

}

TY - JOUR

T1 - DPGS

T2 - Cross-cooperation guided dynamic points generation for scene text spotting

AU - Sun, Wei

AU - Wang, Qianzhou

AU - Hou, Zhiqiang

AU - Chen, Xueling

AU - Yan, Qingsen

AU - Zhang, Yanning

PY - 2024/10/25

Y1 - 2024/10/25

N2 - End-to-end text spotting aims to combine scene text detection and recognition into a unified framework. Dealing with the relationship between the two sub-tasks plays a pivotal role in designing effective spotters. While polygon or segmentation-based methods eliminate heuristic post-processing, they still face challenges such as background noise and high computational burden. In this study, we introduce DPGS, a coarse-to-fine learning framework that lets Dynamic Points Generation for text Spotting. DPGS simultaneously learns character representations for both detection and recognition tasks. Specifically, for each text instance, we represent the character sequence as ordered points and model them with learnable point queries. This approach progressively selects appropriate key points covering character and leverages group attention to associate similar information from different positions, improving detection accuracy. After passing through a single decoder, the point queries encode text semantics and locations, facilitating decoding to central line, boundary, script, and confidence of text through simple prediction heads. Additionally, we introduce an adaptive cooperative criterion to combine more useful feature knowledge, enhancing training efficiency. Extensive experiments show the superiority of our DPGS when handling scene text detection and recognition tasks. Compared to their respective top-1 methods, DPGS has significantly improved the average recognition accuracy by 3.7%, 1.9%, and 0.7% on the Total-Text, ICDAR15, and CTW1500 datasets, respectively.

AB - End-to-end text spotting aims to combine scene text detection and recognition into a unified framework. Dealing with the relationship between the two sub-tasks plays a pivotal role in designing effective spotters. While polygon or segmentation-based methods eliminate heuristic post-processing, they still face challenges such as background noise and high computational burden. In this study, we introduce DPGS, a coarse-to-fine learning framework that lets Dynamic Points Generation for text Spotting. DPGS simultaneously learns character representations for both detection and recognition tasks. Specifically, for each text instance, we represent the character sequence as ordered points and model them with learnable point queries. This approach progressively selects appropriate key points covering character and leverages group attention to associate similar information from different positions, improving detection accuracy. After passing through a single decoder, the point queries encode text semantics and locations, facilitating decoding to central line, boundary, script, and confidence of text through simple prediction heads. Additionally, we introduce an adaptive cooperative criterion to combine more useful feature knowledge, enhancing training efficiency. Extensive experiments show the superiority of our DPGS when handling scene text detection and recognition tasks. Compared to their respective top-1 methods, DPGS has significantly improved the average recognition accuracy by 3.7%, 1.9%, and 0.7% on the Total-Text, ICDAR15, and CTW1500 datasets, respectively.

KW - Coarse-to-fine

KW - Cross-cooperative learning

KW - K-NN search

KW - Scene text detection and recognition

UR - http://www.scopus.com/inward/record.url?scp=85201668581&partnerID=8YFLogxK

U2 - 10.1016/j.knosys.2024.112399

DO - 10.1016/j.knosys.2024.112399

M3 - 文章

AN - SCOPUS:85201668581

SN - 0950-7051

VL - 302

JO - Knowledge-Based Systems

JF - Knowledge-Based Systems

M1 - 112399

ER -

DPGS: Cross-cooperation guided dynamic points generation for scene text spotting

摘要

访问文件

其它文件与链接

指纹

引用此