Script determination of mixed Chinese/English document images using Kolmogorov complexity measure

Zheru Chi, Qing Wang

科研成果: 期刊稿件会议文章同行评审

摘要

In this paper, we propose an approach based on Kolmogorov Complexity (KC) measure for determining script classes in mixed Chinese (complex characters)/English document images. This approach, which mainly consists of two steps: document image preprocessing and KC measure, can successfully separate Chinese text lines from English ones. Our approach is robust and reliable in handling document images of different appearances and densities, and various fonts, sizes and styles of characters used in documents. Experimental results on a set of 40 text line images (20 English text lines and 20 Complex Chinese text lines) from various document images show that 100% correct classification rate can be achieved.

源语言英语
页(从-至)686-692
页数7
期刊Proceedings of SPIE - The International Society for Optical Engineering
4875
2
DOI
出版状态已出版 - 2002
活动Second International Conference on Image and Graphics - Hefei, 中国
期限: 16 8月 200218 8月 2002

指纹

探究 'Script determination of mixed Chinese/English document images using Kolmogorov complexity measure' 的科研主题。它们共同构成独一无二的指纹。

引用此