A holistic representation guided attention network for scene text recognition

Lu Yang; Peng Wang; Hui Li; Zhen Li; Yanning Zhang

doi:10.1016/j.neucom.2020.07.010

A holistic representation guided attention network for scene text recognition

Lu Yang, Peng Wang, Hui Li, Zhen Li, Yanning Zhang

计算机学院

科研成果: 期刊稿件 › 文章 › 同行评审

59 引用（Scopus）

摘要

Reading irregular scene text of arbitrary shape in natural images is still a challenging problem, despite the progress made recently. Many existing approaches incorporate sophisticated network structures to handle various shapes, use extra annotations for stronger supervision, or employ hard-to-train recurrent neural networks for sequence modeling. In this work, we propose a simple yet strong approach for scene text recognition. With no need to convert input images to sequence representations, we directly connect two-dimensional CNN features to an attention-based sequence decoder which guided by holistic representation. The holistic representation can guide the attention-based decoder focus on more accurate area. As no recurrent module is adopted, our model can be trained in parallel. It achieves 1.5× to 9.4× acceleration to backward pass and 1.3× to 7.9× acceleration to forward pass, compared with the RNN counterparts. The proposed model is trained with only word-level annotations. With this simple design, our method achieves state-of-the-art or competitive recognition performance on the evaluated regular and irregular scene text benchmark datasets.

源语言	英语
页（从-至）	67-75
页数	9
期刊	Neurocomputing
卷	414
DOI	https://doi.org/10.1016/j.neucom.2020.07.010
出版状态	已出版 - 13 11月 2020

访问文件

10.1016/j.neucom.2020.07.010

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{a9b8fafa57d0414c93e0ca57d8056988,

title = "A holistic representation guided attention network for scene text recognition",

abstract = "Reading irregular scene text of arbitrary shape in natural images is still a challenging problem, despite the progress made recently. Many existing approaches incorporate sophisticated network structures to handle various shapes, use extra annotations for stronger supervision, or employ hard-to-train recurrent neural networks for sequence modeling. In this work, we propose a simple yet strong approach for scene text recognition. With no need to convert input images to sequence representations, we directly connect two-dimensional CNN features to an attention-based sequence decoder which guided by holistic representation. The holistic representation can guide the attention-based decoder focus on more accurate area. As no recurrent module is adopted, our model can be trained in parallel. It achieves 1.5× to 9.4× acceleration to backward pass and 1.3× to 7.9× acceleration to forward pass, compared with the RNN counterparts. The proposed model is trained with only word-level annotations. With this simple design, our method achieves state-of-the-art or competitive recognition performance on the evaluated regular and irregular scene text benchmark datasets.",

keywords = "Convolutional-Attention, Holistic Representation, Scene Text Recognition, Transformer",

author = "Lu Yang and Peng Wang and Hui Li and Zhen Li and Yanning Zhang",

note = "Publisher Copyright: {\textcopyright} 2020 Elsevier B.V.",

year = "2020",

month = nov,

day = "13",

doi = "10.1016/j.neucom.2020.07.010",

language = "英语",

volume = "414",

pages = "67--75",

journal = "Neurocomputing",

issn = "0925-2312",

publisher = "Elsevier B.V.",

}

TY - JOUR

T1 - A holistic representation guided attention network for scene text recognition

AU - Yang, Lu

AU - Wang, Peng

AU - Li, Hui

AU - Li, Zhen

AU - Zhang, Yanning

PY - 2020/11/13

Y1 - 2020/11/13

N2 - Reading irregular scene text of arbitrary shape in natural images is still a challenging problem, despite the progress made recently. Many existing approaches incorporate sophisticated network structures to handle various shapes, use extra annotations for stronger supervision, or employ hard-to-train recurrent neural networks for sequence modeling. In this work, we propose a simple yet strong approach for scene text recognition. With no need to convert input images to sequence representations, we directly connect two-dimensional CNN features to an attention-based sequence decoder which guided by holistic representation. The holistic representation can guide the attention-based decoder focus on more accurate area. As no recurrent module is adopted, our model can be trained in parallel. It achieves 1.5× to 9.4× acceleration to backward pass and 1.3× to 7.9× acceleration to forward pass, compared with the RNN counterparts. The proposed model is trained with only word-level annotations. With this simple design, our method achieves state-of-the-art or competitive recognition performance on the evaluated regular and irregular scene text benchmark datasets.

AB - Reading irregular scene text of arbitrary shape in natural images is still a challenging problem, despite the progress made recently. Many existing approaches incorporate sophisticated network structures to handle various shapes, use extra annotations for stronger supervision, or employ hard-to-train recurrent neural networks for sequence modeling. In this work, we propose a simple yet strong approach for scene text recognition. With no need to convert input images to sequence representations, we directly connect two-dimensional CNN features to an attention-based sequence decoder which guided by holistic representation. The holistic representation can guide the attention-based decoder focus on more accurate area. As no recurrent module is adopted, our model can be trained in parallel. It achieves 1.5× to 9.4× acceleration to backward pass and 1.3× to 7.9× acceleration to forward pass, compared with the RNN counterparts. The proposed model is trained with only word-level annotations. With this simple design, our method achieves state-of-the-art or competitive recognition performance on the evaluated regular and irregular scene text benchmark datasets.

KW - Convolutional-Attention

KW - Holistic Representation

KW - Scene Text Recognition

KW - Transformer

UR - http://www.scopus.com/inward/record.url?scp=85088897206&partnerID=8YFLogxK

U2 - 10.1016/j.neucom.2020.07.010

DO - 10.1016/j.neucom.2020.07.010

M3 - 文章

AN - SCOPUS:85088897206

SN - 0925-2312

VL - 414

SP - 67

EP - 75

JO - Neurocomputing

JF - Neurocomputing

ER -

A holistic representation guided attention network for scene text recognition

摘要

访问文件

其它文件与链接

指纹

引用此