TY - JOUR
T1 - RRTrN
T2 - A lightweight and effective backbone for scene text recognition
AU - Zhou, Qing
AU - Gao, Junyu
AU - Yuan, Yuan
AU - Wang, Qi
N1 - Publisher Copyright:
© 2023
PY - 2024/6/1
Y1 - 2024/6/1
N2 - Models based on extremely deep convolutional networks and attention mechanisms now have powerful feature extraction abilities, improving text recognition performance in natural scenes. However, the very deep network model built with stacked layers brings massive parameters, which limits the application of the text recognition algorithm on storage-constrained devices. In this paper, we propose a lightweight and effective backbone called the Recursive Residual Transformer Network (RRTrN) for scene text recognition. Specifically, by leveraging recursive learning and a combination of convolutional layers and a transformer unit, RRTrN achieves powerful feature extraction while significantly reducing the number of parameters. This reduction in parameters promotes the deployment of our text recognition algorithm on storage-constrained devices, making it more accessible for practical applications. Furthermore, a recursive distillation strategy is presented to balance the recursive learning inference time and performance, enhancing the practicality and efficiency of RRTrN. Extensive experiments on mainstream benchmarks and popular models verify the generalization of RRTrN and achieve state-of-the-art recognition performance on five datasets. Notably, the classical STR model based on RRTrN can achieve a 3 percentage point increase in recognition accuracy or reduce the number of parameters by 80%.
AB - Models based on extremely deep convolutional networks and attention mechanisms now have powerful feature extraction abilities, improving text recognition performance in natural scenes. However, the very deep network model built with stacked layers brings massive parameters, which limits the application of the text recognition algorithm on storage-constrained devices. In this paper, we propose a lightweight and effective backbone called the Recursive Residual Transformer Network (RRTrN) for scene text recognition. Specifically, by leveraging recursive learning and a combination of convolutional layers and a transformer unit, RRTrN achieves powerful feature extraction while significantly reducing the number of parameters. This reduction in parameters promotes the deployment of our text recognition algorithm on storage-constrained devices, making it more accessible for practical applications. Furthermore, a recursive distillation strategy is presented to balance the recursive learning inference time and performance, enhancing the practicality and efficiency of RRTrN. Extensive experiments on mainstream benchmarks and popular models verify the generalization of RRTrN and achieve state-of-the-art recognition performance on five datasets. Notably, the classical STR model based on RRTrN can achieve a 3 percentage point increase in recognition accuracy or reduce the number of parameters by 80%.
KW - Lightweight and effective backbone
KW - Recursive learning
KW - Scene text recognition
UR - http://www.scopus.com/inward/record.url?scp=85180014738&partnerID=8YFLogxK
U2 - 10.1016/j.eswa.2023.122769
DO - 10.1016/j.eswa.2023.122769
M3 - 文章
AN - SCOPUS:85180014738
SN - 0957-4174
VL - 243
JO - Expert Systems with Applications
JF - Expert Systems with Applications
M1 - 122769
ER -