Imaged Document Text Retrieval without OCR - PowerPoint PPT Presentation

1 / 17
About This Presentation
Title:

Imaged Document Text Retrieval without OCR

Description:

... Clustering To Identify classes of character objects Document representation Hash Table N-Gram To ... Measure of Documents N-Gram Algorithm Cosine ... – PowerPoint PPT presentation

Number of Views:127
Avg rating:3.0/5.0
Slides: 18
Provided by: TRC64
Category:

less

Transcript and Presenter's Notes

Title: Imaged Document Text Retrieval without OCR


1
Imaged Document Text Retrieval without OCR
  • IEEE Trans. on PAMI vol.24, no.6
  • June, 2002
  • ??????

2
Outline
  • Introduction
  • HTD and VTD
  • Class of Character Objects
  • Similarity Measure of Documents
  • Experimental Results
  • Conclusions

3
Introduction
  • Retrieval of Imaged Documents
  • Process with OCR v.s. without OCR
  • Language dependence v.s. language independence

4
Procedure
  • Image Preprocessing
  • Feature extraction of character objects
  • Horizontal Traverse Density (HTD)
  • Vertical Traverse Density (VTD)
  • Clustering
  • To Identify classes of character objects
  • Document representation
  • Hash Table
  • N-Gram
  • To construct indexes for imaged document
    retrieval

5
Features HTD and VTD
6
Class of Character Objects
  • Unsupervise Clustering with HTD and VTD
  • Distance measure of character objects

7
Distance Measure of Character Objects
8
Examples of Character Objects
9
Similarity Measure of Documents
  • N-Gram Algorithm
  • Cosine angle between two documents

10
Corpus
  • UW1 database (600 dpi)

11
Experimental Results
  • Corpus I
  • E01-E26

12
Experimental Results
  • Corpus II

13
Experimental Results
14
Experimental Results
15
Experimental Results
16
Experimental Results
17
Conclusion and Future Work
  • A new method for image document retrieval without
    OCR
  • Retrieval of language independence
  • Improvement of robustness for different fonts and
    noisy documents
Write a Comment
User Comments (0)
About PowerShow.com