Document Image Processing - PowerPoint PPT Presentation

About This Presentation
Title:

Document Image Processing

Description:

Hamming Distance, HD = Correlation: Central Moments: Spread= Slenderness= Fourier descriptors ... Create a simplified binary image by preserving the last black ... – PowerPoint PPT presentation

Number of Views:349
Avg rating:3.0/5.0
Slides: 18
Provided by: meenaksh
Category:

less

Transcript and Presenter's Notes

Title: Document Image Processing


1
Document Image Processing
  1. Fourier Transforms
  2. Hough Transforms
  3. Docstrum
  4. Text vs Graphics

2
Features
Hamming Distance, HD
Correlation
Central Moments
Spread Slenderness Fourier descriptors
3
Fourier Transform

4
Document Images and FT
5
Hough Transform
  • Parametric Form
  • Global
  • Peaks in Accumulator Space
  • Y intercept is infinite
  • Use (r, )

6
r
Accumulator array r,
180o
0o
-r
0
0
180
90
7
(No Transcript)
8
Document Skew
  • Adjust a binary image f(x,y) into portrait mode,
    if necessary.
  • For each row of the binary image f(x,y), generate
    and label its black runs.
  • Build objects based on black runs of the same
    labels and update extreme coordinates of objects.
  • Create a simplified binary image by preserving
    the last black runs of each "allowable" object.
  • Apply the Hough transform on the simplified
    binary image.
  • Analyze the local maxima of the Hough accumulator
    cell array to detect the skew angle of the binary
    image as follows
  • (a) Collect the first and second maxima of Hough
    accumulator cell array elements.
  • (b) Collect all Hough accumulator cell array
    elements of which the values are greater than
    one-half of the second maxima of Hough
    accumulator cell element.
  • (c) Add these values together based on their
    angle.
  • (d) The skew angle is the angle corresponding to
    the maximum of these values.

Document Skew Angle Detection Algorithm, D. X. Le
and G. Thoma, Proc. 1993 SPIE Symposium on
Aerospace and Remote Sensing -Visual Information
Processing II, Orlando, FL, April 14-16, 1993,
Vol. 1961, pp. 251-262.
9
Physical Layout Structure
The description of the physical layout is
constructed from the information extracted from
the document image during page segmentation and
classification information about the position,
dimensions, shape etc. Geometrical
relationships between components Eg., for a
technical article may contain information such
as text component with a set S1 of attributes
below a line drawing component with a set S2 of
attributes the Office Document Architecture
(ODA) the Standard Generalized Markup Language
(SGML) The eXtensible Markup Language (XML)
10
Docstrum
Slope Histograms Use local information Connect a
mark (component) with K (4..6)
neighbors Histogram of the slopes More efficient
than projection profiles Docstrum is the radius
and angle plot of the slopes
11
Extracting Text Strings
  • Set H to the average height of the marks being
    considered
  • Set the Hough space resolution in to 1o and r
    to 0.2H
  • Apply the Hough transform to all marks, using the
    ranges
  • Set the mark count threshold to T20
  • For each cell in the accumulator space with count
    greater than T
  • a) For the cluster of cells calculate the
    average height Hlocal of marks contributing to
    the cluster
  • b) Compute a new clustering factor f Hlocal/R
    re-cluster cells
  • c) Perform string segmentation on marks
    contributing to new clusters
  • 6. Update Hough transform by deleting
    contributions from discarded components in the
    step 5c above
  • 7. Decrement T by 1, and if Tgt2 go to Step 5
  • 8. Compute the Hough transform for the entire
    range of and go to step 4

12
Physical Layout Analysis
A document contains the information that its
author wishes to convey By the formatting of
characters and pictorial information and the
general layout of the document The shape and
size of paragraphs and illustrations, the font of
the characters as well as their positions in the
page can carry a message The physical layout
denotes the organization of the text and
graphical components in the document Physical
layout analysis comprises of page
segmentation page classification and physical
layout structure extraction
13
Page Segmentation
Page segmentation is the identification of
areas of interest in the document image
Identifies the boundaries of the areas in the
image that correspond to the printed regions on
the page A higher-level description of the page
is obtained, in terms of the outlines (contours)
of these areas Methods Connected components
aggregation (bottom-up) Projection-profiles
analysis (top-down) Analysis of background
space (hybrid)
14
Page Classification
Page classification is the determination of the
type of the contents of each area of interest in
the document image Analyze attributes of the
contents of each area and deduce its type In
OCR applications, one is interested in text and
non-text In Graphics applications the non-text
areas must be assigned line drawing, halftone,
photograph, etc.
Texture analysis method p(i,j) is formed
representing the number of times the image
contains a horizontal run of length j whose black
and white proportion is in category i. Categories
are made in bins i) less than 10, (ii) 10-20,
etc.
15
Logical Structure
Semantic structure Eg., Find abstracts of all
papers in a database which include a
keyword Physical structure of a newspaper
extraction of blocks of text, graphics,
half-tones, identification of attributes such as
fonts, size, style Logical structure is
identifying headlines, captions, bylines,
grouping paragraphs belonging to same story
across columns, pages etc HTML vs XML
Physical vs Logical
16
Physical Layout Analysis (Lit Survey)
Wahl et al Closing with a horizontal kernel
(300) AND Closing with vertical kernel (30) Nagy
and Seth X-Y tree Fisher et al Low resolution
image used Lebourgeois et al Non-uniform down
sampling Dilation by a horizontal
kernel Bloomberg Vertical dilation followed by a
close-open sequence to remove noise, followed by
a hit-or-miss transform to identify seed points
of characters to identify italics and bold
fonts Saitoh and Pavlidis Non-uniform down
sampling Hinds et al Erosion using 2-pixel
vertical (horizontal) kernel. Followed by Hough
Transform Pavlidis and Zhou Projection profile
and clustering Amamoto et al Open white space
with long horizontal structural element followed
by vertical and take union O Gorman
Docstrum Ishitani document skew using line
complexities to take care of non-text blocks Chen
and Haralick Recursive opening and
closing Ankindele and Belaid permit
non-rectangular blocks
17
Logical Layout Analysis (Lit Survey)
Tsujimoto and Asada Rule based system Fisher
Rule based system Chenvoy and Belaid Blackboard
system Kreich et al Top-down knowledge based
system Derrien-Peden Frame-based
system Yamashita et al Model-based
method Dengel Busines letters
Write a Comment
User Comments (0)
About PowerShow.com