Scene video text tracking based on hybrid deep text detection and layout constraint

Research output: Contribution to journalArticlepeer-review

20 Scopus citations

Abstract

Video text in real-world scenes often carries rich high-level semantic information and plays an ever-increasingly important role in the content-based video analysis and retrieval. Therefore, the scene video text detection and tracking are important prerequisites of numerous multimedia applications. However, the performance of most existing tracking methods is not satisfactory due to frequent mis-detections, unexpected camera motion and similar appearances between text regions. To address these problems, we propose a new video text tracking approach based on hybrid deep text detection and layout constraint. Firstly, a deep text detection network that combines the advantages of object detection and semantic segmentation in a hybrid way is proposed to locate possible text candidates in individual frames. Then, text trajectories are derived from consecutive frames with a novel data association method, which effectively exploits the layout constraint of text regions in large camera motion. By utilizing the layout constraint, the ambiguities caused by similar text regions are effectively reduced. We conduct experiments on four benchmark datasets, i.e., ICDAR 2015, MSRA-TD 500, USTB-SV1K and Minetto, to evaluate the proposed method. The experimental results demonstrate the effectiveness and superiority of the proposed approach.

Original languageEnglish
Pages (from-to)223-235
Number of pages13
JournalNeurocomputing
Volume363
DOIs
StatePublished - 21 Oct 2019

Keywords

  • Convolutional neural networks
  • Hybrid architecture
  • Layout constraint
  • Scene video text
  • Text detection and tracking

Fingerprint

Dive into the research topics of 'Scene video text tracking based on hybrid deep text detection and layout constraint'. Together they form a unique fingerprint.

Cite this