Title: Binarization of Badly Illuminated Document Images through Shading Estimation and Compensation
1Binarization of Badly Illuminated Document Images
through Shading Estimation and Compensation
Shijian Lu ( dcslsj_at_nus.edu.sg ) and Chew Lim Tan
(tancl_at_comp.nus.edu.sg) School of Computing,
National University of Singapore
Introduction Digital libraries and paper-less
office result in a huge amount of document images
that require the document thresholding before
OCR. However, scanned documents often suffer from
the shading degradation illustrated in Figure 1
on the right. Therefore, a fast and efficient
document thresholding technique tolerant to the
shading degradation will facilitate the document
binarization process significantly.
For the horizontal line shown in Figure 1(a),
Figure 4(a-b) shows the pixel gray levels after
the shading correction.
Figure 1 Shaded sample documents
Our Methods
Figure 4 Pixel gray level after the shading
correction
We binarize shaded document images based on the
shading variation estimated by a two-dimensional
Savitzky-Golay filter, which fits a polynomial
surface to two-dimensional data (such as image
intensity) as follows
Figure 5(a) below shows the compensated document
image where shading has been corrected. Figure
5(b) shows the final binarization results.
Figure 2(a) below shows the fitted first round
cubic polynomial surface. Figure 2(b) shows the
surface value along the horizontal scan line
shown in Figure 1(a)
Figure 5 Compensated document image and the
binarization results
Experimental Results
30 badly illuminated document images are created
to test the performance of our proposed document
thresholding technique. Besides, we also compare
our method with Otsus, Niblacks, and Sauvolas.
Figures 6-7 below show two examples.
Figure 2 First round cubic smoothing surface
The document background is then roughly estimated
based on the first round polynomial surface as
follows
The second round smoothing then fits a cubic
polynomial surface to the document background as
shown in Figure 3(a) below. Figure 3(b) shows the
surface value along the horizontal scan line
shown in Figure 1(a)
Figures 6-7 Binarization of documents in Figure
1(b-c)
Table 1 below shows the experimental results in
terms of the thresholding speed and the character
segmentation rate. As Table 1 shows, the proposed
technique significantly outperforms the other
three.
Table 1 Binarization performance compared with
other methods
Figure 3 Second round cubic smoothing surface
The global document shading can thus be corrected
based on the estimated shading variation as
follows
- Conclusions
- A fast and efficient document thresholding
technique is proposed, which is capable of
binarizing badly illuminated document images
through the shading estimation and the shading
correction. - 2. The proposed technique works particularly well
for documents with a large amount of uniformly
colored background such as text documents, maps,
and engineering drawings.