Connected Component Level Method Identification in Automatic Titleboard Indexing - PowerPoint PPT Presentation

1 / 20
About This Presentation
Title:

Connected Component Level Method Identification in Automatic Titleboard Indexing

Description:

Connected Component Level Method Identification in Automatic Titleboard Indexing – PowerPoint PPT presentation

Number of Views:90
Avg rating:3.0/5.0
Slides: 21
Provided by: sjpi
Category:

less

Transcript and Presenter's Notes

Title: Connected Component Level Method Identification in Automatic Titleboard Indexing


1
Connected Component Level Method Identification
in Automatic Titleboard Indexing
  • Samuel James Pinson
  • Mark Pinson
  • Dr. William Barrett
  • Computer Science Department
  • Brigham Young University

2
Connected Component LevelMethod Identification
  • Distinguishing between machine print and
    handwriting

3
Microfilm Titleboards
  • Type
  • Location
  • Time
  • Acquisition

4
Microfilm Titleboards
5
Problem Statement
  • To make genealogical microfilm more accessible by
    automatically building a searchable index over
    titleboards
  • Preprocess titleboards
  • Distinguish between machine print and handwriting
    (Method identification)
  • Recognize machine print and handwriting (OCR)
  • Build searchable index

6
Method Identification
  • Discriminator Between Handwritten and
    Machine-Printed Characters Umeda et al. 1990
  • U.S. Patent 4,910,787

Handwriting
Slanted
Horizontal
Vertical
7
Method Identification
  • Separating Handwritten Material from Machine
    Printed Text Using Hidden Markov Models Guo et
    al. 2001

50
50
55
45
38
62
33
67
21
79
23
77
Machine Print
Handwriting
8
Method Identification
  • Machine Printed Text and Handwriting
    Identification in Noisy Document Images Zheng et
    al. 2004

Noise
Noise
Handwriting
Noise
Machine Print
Handwriting
Handwriting
Machine Print
Machine Print
9
Method Identification
  • Eigenfaces Turk and Pentland 1991

N
2
N
0
71
250
68
210
44
128
53
N
10
Method Identification
11
Method Identification
12
Method Identification
  • The Use of Eigenpictures for Optical Character
    Recognition Muller and Herbst 1998

13
Method Identification
14
Method Identification
  • Determining a local distance threshold via radial
    density

Global Target Precision
98
Local Precision
100
99
92
15
Index Construction
Archivo
de
la
Parroquia
INDEX
BAUTISMO
115
1877
1878
Red.
12-1
Fecha
9-27-1960
16
Querying
ltbautismogt AND lt1960gt
INDEX
17
Results
18
Results
19
Future Work
  • Robust preprocessing and segmentation
  • Incorporate lexical, font, and style context.
  • Metadata about indexed terms script, language,
    meaning
  • Specialize the set of representative machine
    print connected components

20
Conclusions
  • Connected component level method identification
  • Progress towards automatic titleboard indexing
Write a Comment
User Comments (0)
About PowerShow.com