Presentation for CBDAR 2005 - PowerPoint PPT Presentation

1 / 41
About This Presentation
Title:

Presentation for CBDAR 2005

Description:

... camera or hand phone is becoming an alternate choice for document capture and digitalization. ... captured through a digital camera: the one on the left ... – PowerPoint PPT presentation

Number of Views:43
Avg rating:3.0/5.0
Slides: 42
Provided by: NUS16
Category:

less

Transcript and Presenter's Notes

Title: Presentation for CBDAR 2005


1
Presentation for CBDAR 2005
Camera Document Restoration for OCR Shijian Lu
Chew Lim Tan School of Computing National
University of Singapore
2
  • Introduction
  • New Document Capture Method
  • Traditionally, document scanner is widely used
    for document capture. As sensor resolution
    increases in recent years, high-speed non-contact
    text capture through a digital camera or hand
    phone is becoming an alternate choice for
    document capture and digitalization.
  • The Related Problems
  • Unlike document images captured through a
    document scanner, images captured by camera
    generally contain two new types of distortions
    including the perspective distortion introduced
    during the capture process and the geometric
    distortion resulting from the non-flat document
    surface where text lies. Both distortions must be
    removed before OCR.

3
  • Introduction

The sample images below show two new types of
distortions
Images captured through a digital camera the one
on the left contains only perspective distortion,
while the one on the right contains both
perspective geometric distortions
4
  • Introduction

Reported Methods
  • Perspective Rectification
  • 1. P. Clark, M. Mirmhedi
  • Requires document boundary labeled with (2) in
    the figure in previous page
  • 2. C.R. Dance
  • Requires column boundary labeled with (1) in the
    figure in previous page
  • Geometric Rectification
  • M. S. Brown, W. B. Seales
  • M. Pilu
  • Both require the auxiliary hardware for 3D
    measurements

5
  • Document Image Recognition

Document Image Recognition
  • Text Line Segmentation
  • Vertical Stroke Boundary (VSB)
    Identification
  • Distortion Differentiation
  • Skew Detection and Correction
  • Perspective Distortion Detection and
    Rectification
  • Geometric Distortion Detection and
    Rectification
  • OCR Experimentation

6
  • Document Image Recognition

Text Line Extraction
Text line extraction is implemented through a
character tracing process, which categorize
characters to different text lines based on the
point-to-point and point-to-line distance
constraints illustrated below
Character tracing process
7
  • Document Image Recognition

Text Line Extraction
With classified character centroids, a group of
straight lines or conics can be fitted. The
figures below show the fitted lines
Straight lines and conics fitted using classified
character centroids
8
  • Document Image Recognition

VSB Identification
  • Stroke Boundary Extraction

(a)-(d) Four sets of structuring elements
customized for stroke boundary extraction
9
  • Document Image Recognition

VSB Identification
  • Stroke Boundary Extraction
  • Figures (e) and (f) give extracted left-side and
    right-side stroke boundaries using two equations
    below

(a) Distorted character b (b)-(d) extracted
stroke boundaries using structuring elements
given in (a) of the figure in last page (e)-(f)
stroke boundaries determined using two equations
on the left
Symbol ? and represent the erosion and XOR
operations
10
  • Document Image Recognition

VSB Identification
  • Stroke Boundary Extraction
  • To facilitate the identification of VSB,
    extracted stroke boundaries are filtered through
    a size filter first, which removes the stroke
    boundaries with small size.

Stroke boundary extraction the figure on the
left gives binarized sample word the one in the
middle shows the stroke boundaries extracted from
the left side of character strokes the one on
the right gives labeled stroke boundaries after
the size filtering
11
  • Document Image Recognition

VSB Identification
  • Fuzzy Set Construction
  • The desired VSB must be big, straight, and
    properly posed. Two fuzzy sets characterizing
    their size and linearity properties are firstly
    constructed to determine the VSB candidates. The
    desired VSB are then identified based on their
    pose property. Size set is constructed using
    Zadehs S-function.

Parameters a and c are taken as one and two times
of average stroke boundary size. Parameter b
refers to the crossover point.
12
  • Document Image Recognition

VSB Identification
Table 1 Constructed size sets (SMV size
membership value)
  • Fuzzy Set Construction
  • For the labeled stroke boundaries of the sample
    word laboratory, the membership values of the
    size set can be determined using the Zadehs
    S-function.

13
  • Document Image Recognition

VSB Identification
  • Fuzzy Set Construction
  • The linearity set are constructed based on the
    correlation coefficient of the least square
    method.

(xi, yi), i 1n, refer to the ith extracted
boundary pixel. Parameters and
represent the average x and y coordinate of
extracted boundary pixels.
14
  • Document Image Recognition

VSB Identification
Table 2 Constructed linearity sets (LMV
linearity membership value)
  • Fuzzy Set Construction
  • For the labeled stroke boundaries of the sample
    word laboratory, the membership values of the
    linearity set can be determined as the
    correlation coefficient of the least fitting
    method.

15
  • Document Image Recognition

VSB Identification
  • VSB Candidate Determination
  • The desired VSB must be big and straight compared
    to other stroke boundaries. VSB candidate can
    thus be determined through the combination of the
    size and linearity sets constructed, which is
    carried out using an fuzzy aggregator

S and L refer to the constructed size and
linearity sets. Parameter stands
for the compensation factor indicating where the
actual operator is located between union and
intersection. VSB candidates can thus determined
based on the fact that VSB number is generally
half of character number.
16
  • Document Image Recognition

VSB Identification
  • Fuzzy Set Construction
  • For the labeled stroke boundaries of the sample
    word laboratory, the membership values of the
    aggregation set can be determined using the
    aggregator operation. Based on the fact that VSB
    number is generally half of the character number,
    Stroke boundary 1, 6, 11, 19, and 21 are
    determined as the VSB candidates.

Table 3 Constructed aggregation sets (AMV
aggregation membership value)
17
  • Document Image Recognition

VSB Identification
The desired VSB can be finally identified through
the use of pose value, which is determined as the
slope of the straight line fitted using the
determined VSB candidates. The pose expectation
for each determined VSB candidate can be
determined as
Parameter k represents the number of nearest
neighbors to the studied VSB candidate and it can
be taken as a number from 3 to 6.
18
  • Document Image Recognition

VSB Identification
  • Pose value determination
  • For the labeled stroke boundaries of the sample
    word laboratory, the pose property can be
    determined as the slope of the straight line
    fitted using the determined VSB candidates. Based
    on the pose expectation, stroke boundary 20 is
    rejected, as its pose value deviates far from
    that of the neighboring VSB candidates.

Table 4 Calculated pose values (PV pose value)
19
  • Document Image Recognition

VSB Identification
Table below gives the whole picture of the
proposed VSB identification process
20
  • Document Image Recognition

VSB Identification
  • VSB Candidate Determination
  • Based on the aggregation values as given in the
    Table, stroke boundary 1, 6, 11, 19, and 20 can
    be determined as VSB candidates for the following
    example. VSB candidate 20 is further rejected
    based on its pose property.

VSB identification process the figure on the
left gives labeled stroke boundaries the one in
the middle shows the determined vertical stroke
boundary candidates the one on the right gives
the identified VSB
21
  • Document Image Recognition

Distortion Differentiation
Document images with skew or perspective
distortions can be first differentiated from the
ones with geometric distortion based on the
linear fitting error, which can be evaluated
using the distance
where parameters n and m refer to the number of
the fitted middle lines and the number of
characters centroids within the ith classified
character centroid category. Parameter Li refers
to the middle line fitted with the character
centroids within the ith category. Function
Dist(Cj, Li) calculates the distance between the
jth character centroid Cj within the ith
character centroid category and Li.
22
  • Document Image Recognition

Distortion Differentiation
Skew and perspective distortions can be further
differentiated from each other based on relative
orientation of the fitted middle lines, which can
be evaluated with the following equation
where parameter n refers to the number of the
fitted middle lines. Parameter ?i refers to the
orientation angle of the ith fitted middle line.
For skewed document images, the relative
orientation RO is quite close to zero. But for
the perspective document images, RO is normally
much bigger.
23
  • Document Image Recognition

Skew Detection and Correction
Skew distortion can be simple removed based on
text line information. Skew angle can be
estimated based on the slope of fitted middle
line of text lines
Parameter are x and y coordinates
of character centroids within the i-th set.
are average of x and y coordinates.
24
  • Document Image Recognition

Skew Detection and Correction
We propose to exploit character eigen-points to
detect the upside down situation while skew angle
is bigger than 90 degrees or smaller than ? 90
degrees. For each character, eigen-points are
defined as the highest and lowest points in the
direction orthogonal to the orientation of fitted
middle line of text lines.
Detected character eigen-points
25
  • Document Image Recognition

Skew Detection and Correction
The orientation of text lines can thus be
determined based on the fact that the number of
ascenders is much bigger than that of descenders
for Roman letters. The top line and base line can
also be fitted using the eigen-points of
characters with no ascender or descender.
Detected character ascender descender, and the
top line base line fitted using the
eigen-points of characters with no ascender and
descender
26
  • Document Image Recognition

Skew Detection and Correction
Document images with upside down skew can thus be
detected and restored
27
  • Document Image Recognition

Perspective Distortion Detection and Correction
We propose to correct the perspective distortion
through quadrilateral correspondence
construction. The source quadrilateral is
determined using the top base line and the
identified VSB. The target rectangle is
determined based on the number of characters
enclosed within the source quadrilateral and the
approximation of character aspect ratio 11.
Homography determination the figure on the left
gives the quadrilateral determined based on the
top line base line the one on the right gives
the estimated target rectangle
28
  • Document Image Recognition

Perspective Distortion Detection and Correction
With four point correspondences, the homography
can be estimated using the following equation
where the four point correspondences are given in
the figure in previous page
29
  • Document Image Recognition

Perspective Distortion Detection and Correction
Multiple homography can thus be determined. The
best one minimizes the following distance
where m is the number of detected text lines and
n is the number of the identified VSB. Sli is the
orientation of ith restored text line and Savg is
the orientation average. ptxj and pbxj represent
two x coordinates of vertices of jth restored VSB
and the component abs((ptxj- pbxj)/ Distavg) is
the normalized distance in the horizontal
direction between the vertices of that VSB.
30
  • Document Image Recognition

Perspective Distortion Detection and Correction
With the optimal homography, camera documents
with perspective distortion can be rectified.
Distorted and corrected camera document
31
  • Document Image Recognition

Geometric Distortion Detection and Correction
With the detected top base line and the
identified VSB, camera documents with geometric
distortion can be segmented into multiple smaller
patches.
(a) Fitted top line and base line and identified
VSB (b-c) VSB processing (d) segmentation of
distorted sample word
32
  • Document Image Recognition

Geometric Distortion Detection and Correction
Based on the features including character span,
character ascender and descender, and character
intersection numbers, characters can be
categorized into 6 categories with 6 different
height-width ratios
Table 6 The classification of characters and
their width-height-ratios
33
  • Document Image Recognition

Geometric Distortion Detection and Correction
Based on the aspect ratios Ri, the width of
target rectangle can thus be determined as
where VBSavg represents the average size of
identified VSB and parameter n represents the
number of characters and inter-word blanks
enclosed within the partitioned image patches.
34
  • Document Image Recognition

Geometric Distortion Detection and Correction
From segmented small image patches, target
rectangles can be restored. Based on the vertex
of the quadrilateral correspondences,
rectification homography can be estimated for
each segmented patch and camera documents with
geometric distortions can be rectified patch by
patch.
The segmentation of text lines with geometric
distortion
35
  • Document Image Recognition

Geometric Distortion Detection and Correction
The figure below gives the segmentation of camera
document with geometric distortion
Camera document sample and the segmentation result
36
  • Document Image Recognition

Geometric Distortion Detection and Correction
The figure below gives the restored target
rectangles and the corrected document image
Restored target rectangles and rectified document
37
  • OCR Experimentation

Restored text images are then input to an OCR
engine. 150 sample images with skew, perspective
and geometric distortions are tested.
38
  • Conclusion-Hardware Outlook
  • Document processing may be embedded into the
    future mobile devices provided
  • This is possible as resolution of camera
    sensor on mobile phone, PDA, and webcam has been
    greatly improved.
  • Some dedicated mobile document analysis chip
    has been designed and available on the market.
  • Camera documents may be analyzed and
    recognized based on menu command or a simple
    button-clicking operation.

39
  • Conclusion-Research Direction
  • For analysis and understanding of documents
    captured by mobile devices,
  • Skew, perspective and geometric (non-flat
    documents) distortions are inevitable in real
    applications as images are often captured in a
    hurry. Therefore, restoration is always required.
  • One research direction is to propose
    recognition algorithm that is tolerant of
    distortion to bypass the process of restoration.

40
  • Conclusion-Software Embedding
  • Currently, the proposed technique works well on
    the desktop computer. It has the potential to be
    embedded into mobile devices
  • The restoration is fairly fast and the
    process takes around 2-4 seconds for 640480
    document images. The speed can be further
    improved through code optimization.
  • The proposed method requires only a single
    document image captured by the mobile device.
    There is no large memory requirement.

41
  • Conclusion-Future
  • Future Outlook
  • With improved resolution of webcam, we envisage
    rapid remote document capture and text
    dissemination through webcams, for subsequent
    information processing.
  • With availability of mobile document chips on the
    market, document restoration and recognition
    algorithms can be adapted and embedded into
    mobile devices such as mobile phones and PDAs.
Write a Comment
User Comments (0)
About PowerShow.com