RRTrN: A lightweight and effective backbone for scene text recognition

Qing Zhou; Junyu Gao; Yuan Yuan; Qi Wang

doi:10.1016/j.eswa.2023.122769

RRTrN: A lightweight and effective backbone for scene text recognition

Qing Zhou, Junyu Gao, Yuan Yuan, Qi Wang

School of Artificial Intelligence, OPtics and Electronics

Northwestern Polytechnical University Xian

Research output: Contribution to journal › Article › peer-review

5 Scopus citations

Abstract

Models based on extremely deep convolutional networks and attention mechanisms now have powerful feature extraction abilities, improving text recognition performance in natural scenes. However, the very deep network model built with stacked layers brings massive parameters, which limits the application of the text recognition algorithm on storage-constrained devices. In this paper, we propose a lightweight and effective backbone called the Recursive Residual Transformer Network (RRTrN) for scene text recognition. Specifically, by leveraging recursive learning and a combination of convolutional layers and a transformer unit, RRTrN achieves powerful feature extraction while significantly reducing the number of parameters. This reduction in parameters promotes the deployment of our text recognition algorithm on storage-constrained devices, making it more accessible for practical applications. Furthermore, a recursive distillation strategy is presented to balance the recursive learning inference time and performance, enhancing the practicality and efficiency of RRTrN. Extensive experiments on mainstream benchmarks and popular models verify the generalization of RRTrN and achieve state-of-the-art recognition performance on five datasets. Notably, the classical STR model based on RRTrN can achieve a 3 percentage point increase in recognition accuracy or reduce the number of parameters by 80%.

Original language	English
Article number	122769
Journal	Expert Systems with Applications
Volume	243
DOIs	https://doi.org/10.1016/j.eswa.2023.122769
State	Published - 1 Jun 2024

Keywords

Lightweight and effective backbone
Recursive learning
Scene text recognition

Access to Document

10.1016/j.eswa.2023.122769

Cite this

@article{e9732131790c4cd09f9806a4dfb3947f,

title = "RRTrN: A lightweight and effective backbone for scene text recognition",

abstract = "Models based on extremely deep convolutional networks and attention mechanisms now have powerful feature extraction abilities, improving text recognition performance in natural scenes. However, the very deep network model built with stacked layers brings massive parameters, which limits the application of the text recognition algorithm on storage-constrained devices. In this paper, we propose a lightweight and effective backbone called the Recursive Residual Transformer Network (RRTrN) for scene text recognition. Specifically, by leveraging recursive learning and a combination of convolutional layers and a transformer unit, RRTrN achieves powerful feature extraction while significantly reducing the number of parameters. This reduction in parameters promotes the deployment of our text recognition algorithm on storage-constrained devices, making it more accessible for practical applications. Furthermore, a recursive distillation strategy is presented to balance the recursive learning inference time and performance, enhancing the practicality and efficiency of RRTrN. Extensive experiments on mainstream benchmarks and popular models verify the generalization of RRTrN and achieve state-of-the-art recognition performance on five datasets. Notably, the classical STR model based on RRTrN can achieve a 3 percentage point increase in recognition accuracy or reduce the number of parameters by 80%.",

keywords = "Lightweight and effective backbone, Recursive learning, Scene text recognition",

author = "Qing Zhou and Junyu Gao and Yuan Yuan and Qi Wang",

note = "Publisher Copyright: {\textcopyright} 2023",

year = "2024",

month = jun,

day = "1",

doi = "10.1016/j.eswa.2023.122769",

language = "英语",

volume = "243",

journal = "Expert Systems with Applications",

issn = "0957-4174",

publisher = "Elsevier Ltd",

}

TY - JOUR

T1 - RRTrN

T2 - A lightweight and effective backbone for scene text recognition

AU - Zhou, Qing

AU - Gao, Junyu

AU - Yuan, Yuan

AU - Wang, Qi

PY - 2024/6/1

Y1 - 2024/6/1

N2 - Models based on extremely deep convolutional networks and attention mechanisms now have powerful feature extraction abilities, improving text recognition performance in natural scenes. However, the very deep network model built with stacked layers brings massive parameters, which limits the application of the text recognition algorithm on storage-constrained devices. In this paper, we propose a lightweight and effective backbone called the Recursive Residual Transformer Network (RRTrN) for scene text recognition. Specifically, by leveraging recursive learning and a combination of convolutional layers and a transformer unit, RRTrN achieves powerful feature extraction while significantly reducing the number of parameters. This reduction in parameters promotes the deployment of our text recognition algorithm on storage-constrained devices, making it more accessible for practical applications. Furthermore, a recursive distillation strategy is presented to balance the recursive learning inference time and performance, enhancing the practicality and efficiency of RRTrN. Extensive experiments on mainstream benchmarks and popular models verify the generalization of RRTrN and achieve state-of-the-art recognition performance on five datasets. Notably, the classical STR model based on RRTrN can achieve a 3 percentage point increase in recognition accuracy or reduce the number of parameters by 80%.

AB - Models based on extremely deep convolutional networks and attention mechanisms now have powerful feature extraction abilities, improving text recognition performance in natural scenes. However, the very deep network model built with stacked layers brings massive parameters, which limits the application of the text recognition algorithm on storage-constrained devices. In this paper, we propose a lightweight and effective backbone called the Recursive Residual Transformer Network (RRTrN) for scene text recognition. Specifically, by leveraging recursive learning and a combination of convolutional layers and a transformer unit, RRTrN achieves powerful feature extraction while significantly reducing the number of parameters. This reduction in parameters promotes the deployment of our text recognition algorithm on storage-constrained devices, making it more accessible for practical applications. Furthermore, a recursive distillation strategy is presented to balance the recursive learning inference time and performance, enhancing the practicality and efficiency of RRTrN. Extensive experiments on mainstream benchmarks and popular models verify the generalization of RRTrN and achieve state-of-the-art recognition performance on five datasets. Notably, the classical STR model based on RRTrN can achieve a 3 percentage point increase in recognition accuracy or reduce the number of parameters by 80%.

KW - Lightweight and effective backbone

KW - Recursive learning

KW - Scene text recognition

UR - http://www.scopus.com/inward/record.url?scp=85180014738&partnerID=8YFLogxK

U2 - 10.1016/j.eswa.2023.122769

DO - 10.1016/j.eswa.2023.122769

M3 - 文章

AN - SCOPUS:85180014738

SN - 0957-4174

VL - 243

JO - Expert Systems with Applications

JF - Expert Systems with Applications

M1 - 122769

ER -

RRTrN: A lightweight and effective backbone for scene text recognition

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this