TY - JOUR
T1 - Document Image Binarization with Feedback for Improving Character Segmentation
AU - Chi, Zheru
AU - Wang, Qing
N1 - Publisher Copyright:
© 2005 World Scientific Publishing Company.
PY - 2005/4/1
Y1 - 2005/4/1
N2 - Binarization of gray scale document images is one of the most important steps in automatic document image processing. In this paper, we present a two-stage document image binarization approach, which includes a top-down region-based binarization at the first stage and a neural network based binarization technique for the problematic blocks at the second stage after a feedback checking. Our two-stage approach is particularly effective for binarizing text images of highlighted or marked text. The region-based binarization method is fast and suitable for processing large document images. However, the block effect and regional edge noise are two unavoidable problems resulting in poor character segmentation and recognition. The neural network based classifier can achieve good performance in two-class classification problem such as the binarization of gray level document images. However, it is computationally costly. In our two-stage binarization approach, the feedback criteria are employed to keep the well binarized blocks from the first stage binarization and to re-binarize the problematic blocks at the second stage using the neural network binarizer to improve the character segmentation quality. Experimental results on a number of document images show that our two-stage binarization approach performs better than the single-stage binarization techniques tested in terms of character segmentation quality and computational cost.
AB - Binarization of gray scale document images is one of the most important steps in automatic document image processing. In this paper, we present a two-stage document image binarization approach, which includes a top-down region-based binarization at the first stage and a neural network based binarization technique for the problematic blocks at the second stage after a feedback checking. Our two-stage approach is particularly effective for binarizing text images of highlighted or marked text. The region-based binarization method is fast and suitable for processing large document images. However, the block effect and regional edge noise are two unavoidable problems resulting in poor character segmentation and recognition. The neural network based classifier can achieve good performance in two-class classification problem such as the binarization of gray level document images. However, it is computationally costly. In our two-stage binarization approach, the feedback criteria are employed to keep the well binarized blocks from the first stage binarization and to re-binarize the problematic blocks at the second stage using the neural network binarizer to improve the character segmentation quality. Experimental results on a number of document images show that our two-stage binarization approach performs better than the single-stage binarization techniques tested in terms of character segmentation quality and computational cost.
KW - BP algorithm
KW - character segmentation
KW - Document image processing
KW - multi-layer perceptron
KW - page segmentation
KW - region-based binarization
KW - thresholding
UR - http://www.scopus.com/inward/record.url?scp=85073070949&partnerID=8YFLogxK
U2 - 10.1142/S0219467805001768
DO - 10.1142/S0219467805001768
M3 - 文章
AN - SCOPUS:85073070949
SN - 0219-4678
VL - 5
SP - 281
EP - 309
JO - International Journal of Image and Graphics
JF - International Journal of Image and Graphics
IS - 2
ER -