TY - JOUR
T1 - Scene video text tracking based on hybrid deep text detection and layout constraint
AU - Wang, Xihan
AU - Feng, Xiaoyi
AU - Xia, Zhaoqiang
N1 - Publisher Copyright:
© 2019 Elsevier B.V.
PY - 2019/10/21
Y1 - 2019/10/21
N2 - Video text in real-world scenes often carries rich high-level semantic information and plays an ever-increasingly important role in the content-based video analysis and retrieval. Therefore, the scene video text detection and tracking are important prerequisites of numerous multimedia applications. However, the performance of most existing tracking methods is not satisfactory due to frequent mis-detections, unexpected camera motion and similar appearances between text regions. To address these problems, we propose a new video text tracking approach based on hybrid deep text detection and layout constraint. Firstly, a deep text detection network that combines the advantages of object detection and semantic segmentation in a hybrid way is proposed to locate possible text candidates in individual frames. Then, text trajectories are derived from consecutive frames with a novel data association method, which effectively exploits the layout constraint of text regions in large camera motion. By utilizing the layout constraint, the ambiguities caused by similar text regions are effectively reduced. We conduct experiments on four benchmark datasets, i.e., ICDAR 2015, MSRA-TD 500, USTB-SV1K and Minetto, to evaluate the proposed method. The experimental results demonstrate the effectiveness and superiority of the proposed approach.
AB - Video text in real-world scenes often carries rich high-level semantic information and plays an ever-increasingly important role in the content-based video analysis and retrieval. Therefore, the scene video text detection and tracking are important prerequisites of numerous multimedia applications. However, the performance of most existing tracking methods is not satisfactory due to frequent mis-detections, unexpected camera motion and similar appearances between text regions. To address these problems, we propose a new video text tracking approach based on hybrid deep text detection and layout constraint. Firstly, a deep text detection network that combines the advantages of object detection and semantic segmentation in a hybrid way is proposed to locate possible text candidates in individual frames. Then, text trajectories are derived from consecutive frames with a novel data association method, which effectively exploits the layout constraint of text regions in large camera motion. By utilizing the layout constraint, the ambiguities caused by similar text regions are effectively reduced. We conduct experiments on four benchmark datasets, i.e., ICDAR 2015, MSRA-TD 500, USTB-SV1K and Minetto, to evaluate the proposed method. The experimental results demonstrate the effectiveness and superiority of the proposed approach.
KW - Convolutional neural networks
KW - Hybrid architecture
KW - Layout constraint
KW - Scene video text
KW - Text detection and tracking
UR - http://www.scopus.com/inward/record.url?scp=85071329307&partnerID=8YFLogxK
U2 - 10.1016/j.neucom.2019.05.101
DO - 10.1016/j.neucom.2019.05.101
M3 - 文章
AN - SCOPUS:85071329307
SN - 0925-2312
VL - 363
SP - 223
EP - 235
JO - Neurocomputing
JF - Neurocomputing
ER -