Document Image Analysis - PowerPoint PPT Presentation

About This Presentation
Title:

Document Image Analysis

Description:

Document Image Analysis, Gorman and Kasturi , IEEE Computer Society Press ... International Workshop on Document Analysis Systems proceedings ... – PowerPoint PPT presentation

Number of Views:357
Avg rating:3.0/5.0
Slides: 11
Provided by: ffar
Category:

less

Transcript and Presenter's Notes

Title: Document Image Analysis


1
Document Image Analysis CSE 717 An Introduction
2
Document Image Analysis
  • DIA is the theory and practice of recovering the
    symbol structures of digital images scanned from
    paper or produced by computer
  • DIA is a subfield of Digital Image processing
  • Digital images of natural objects X-rays,
    fingerprints, faces, scenery, etc. are NOT part
    of DIA
  • Digital images of symbolic objects Postal
    addresses, printed articles, forms, music sheets,
    engineering drawings, topographic maps belong to
    DIA
  • Source Scanners, printers, fax machines, hand!
  • Incidental text license plates, billboards,
    subtitles, in photos and video
  • WWW ??
  • DIAs grand goal is take us to the land of
    paperless office

3
Document Image Analysis
Graphical Processing
Textual Processing
Optical Character Recognition
Page Layout Analysis
Line Processing
Region and Symbol Processing
Skew, blocks, paragraphs
Lines, curves, corners
Filled regions
Text
4
Document Image Analysis
Processing Text Graphics
Pixels Preprocessing Representation, Noise removal, binarization, skew, script id, font id Preprocessing Representation, Noise removal, binarization, thinning, vectorization
Primitives Glyph Recognition Connected components, strokes, punctuations, words Primitive Recognition Straight lines, curve segments, junctions, nodes, loops, characters
Structures Text Recognition Word segmentation, text line reconstruction, table analysis, linguistics Structure Recognition Text fields, legends, labels, dimensions, graphics symbols
Documents Page Layout Analysis Text versus non-text, physical component analysis, logical component analysis, functional component analysis, compression Interpretation Component recognition, connectivity analysis, CAD layer separation, Database attribute extraction, Compression
Corpus Information Retrieval Document Classification, indexing, search, security, authentication, privacy Database, CAD Validation, search, update
5
Postal Examples
6
Forms
7
Unconstrained Text
8
Graphics Documents
9
References
  • Handbook of Character Recognition and Document
    Image Analysis, H. Bunke and PSP Wang (editors),
    World Scientific Press
  • Document Image Analysis, Gorman and Kasturi ,
    IEEE Computer Society Press
  • International Conference on Document Analysis and
    Recognition proceedings
  • International Workshop on Document Analysis
    Systems proceedings
  • Symposium on Document Image Understanding
    Technology

10
  • OCR Features and Systems
  • Script ID, Devanagari OCR, Tamil OCR, MP versus
    HW
  • Handwriting Recognition
  • Postal applications, Arabic Documents
  • Classifiers and Learning
  • Multi-classifier systems
  • Layout Analysis
  • Skew correction, geometric methods, test/graphics
    separation, logical labeling
  • Tables and Forms
  • Detecting tables in HTML documents, use of graph
    grammars, semantics
  • Document Engineering
  • Processing of historical documents (palm leaf
    manuscripts).
  • Camera Based DIA
  • Locating and reading Barcodes
  • New Applications -CAPTCHA
Write a Comment
User Comments (0)
About PowerShow.com