UNSD Regional Workshop on Census Data Processing for the English speaking African Countries: Contemporary technologies for data capture, methodology and practice of data editing - PowerPoint PPT Presentation

About This Presentation
Title:

UNSD Regional Workshop on Census Data Processing for the English speaking African Countries: Contemporary technologies for data capture, methodology and practice of data editing

Description:

... Workshop on Census Data Processing for the English speaking African Countries: ... – PowerPoint PPT presentation

Number of Views:136
Avg rating:3.0/5.0

less

Transcript and Presenter's Notes

Title: UNSD Regional Workshop on Census Data Processing for the English speaking African Countries: Contemporary technologies for data capture, methodology and practice of data editing


1
Optical Data Capture Optical Character
Recognition (OCR) Intelligent Character
Recognition (ICR) Intelligent Recognition
2
Summary
  • Concept/Definition
  • Forms Design
  • Scanners Software
  • Storage
  • Accuracy
  • OCR/ICR Advantages and Disadvantages
  • Intelligent Recognition (IR)
  • Commercial Suppliers

3
Definition/Concept of OCR
  • Gives scanning and imaging systems the ability to
    turn images of machine printed characters into
    machine readable characters.
  • Images of the machine printed characters are
    extracted from a bitmap of the scanned image

4
Definition/Concept of ICR
  • Gives scanning and imaging systems the ability to
    turn images of hand written characters into
    machine readable characters
  • Images of the hand written characters are
    extracted from a bitmap of the scanned image

5
OCR and ICR Differences
  • OCR is less accurate than OMR but more accurate
    than ICR
  • ICR will require editing to achieve high data
    coverage

6
Forms
  • OCR/ICR has less strict form design compared to
    OMR
  • No timing tracks
  • Has Registration Marks
  • ICR requires hand printed boxes filled one
    alphanumeric character per box

7
OCR
  • Forms
  • OCR/ ICR is more flexible since
  • no timing tracks are required
  • The image can float on a page
  • The use of drop color reduces the size of the
    scanners output and enhances the accuracy
  • ICR/OCR technology often uses registration mark
    on the four-corners of a document, in the
    recognition of an image

8
(No Transcript)
9
OCR/ICR Scanners and Software
  • Forms can be scanned through a scanner and then
    the recognition engine of the OCR/ICR system
    interpret the images and turn images of
    handwritten or printed characters into ASCII data
    (machine-readable characters).
  • Users can scan up without doing the OCR
  • Speeds Range from 85-160 sheets/min (dependent
    on the recognition engine)

10
OCR/ICR Storage Characteristics
  • Storage/Retrieval
  • Images are scanned and stored and maintained
    electronically
  • There is no need to store the paper forms as long
    as you safeguard the electronic files
  • With OCR/ICR technologies, images can be scanned,
    indexed, and written to optical media

11
Ideal OCR/ICR Accuracy Thresholds
  • Accuracy
  • Accuracy achieved by data entry clerks (99.5)
    are approximately equal to OCR/ICR in in perfect
    tuning (99.5)
  • Up to 99.9 accuracy with editing (like OMR)
  • The recognition engine must be tuned, tested and
    validated very carefully

12
OCR/ICR Advantages
  • Advantages
  • Recognition engines used with imaging can capture
    highly specialized data sets
  • OCR/ICR recognize machine-printed or hand-printed
    characters.
  • Scanning and recognition allowed efficient
    management and planning for the rest of the
    processing workload
  • Quick retrieval for editing and reprocessing

13
OCR/ICR Disadvantages
  • Technology is costly
  • May require significant manual intervention
  • Additional workload to data collectors -ICR has
    severe limitations when it comes to human
    handwriting
  • Characters must be hand-printed/machine-printed
    with separate characters in boxes
  • ineffective when dealing with cursive characters

14
OMR-OCR/ICR Compared
15
OCR/ICR Challenges/Issues
  • Has corresponding issues with OMR
  • Algorithm development (Preparation of memory
    dictionary)
  • Processing time considerations due to recognition
    engine
  • Development costs

16
Definition/Concept of IR
  • State of the art recognition technology
  • Gives scanning and imaging systems the ability to
    turn images of hand written and cursive
    characters into machine readable characters
  • Images of the hand written and cursive characters
    are extracted from a bitmap of the scanned image
  • The ability to capture cursive make this method
    unique

17
Definition/Concept of IR
  • eight elements that make up the trajectories of
    all cursive letters (figure 1)

Photo Parascript LLC
18
Definition/Concept of IR
  • Intelligent Recognition dynamically uses context
  • context is used during the recognition process,
    improving the accuracy of results
  • Contexts helps to identify letters where the
    symbol segmentation of an image is ambiguous

Photo Parascript LLC
19
Technology Evolution
FORM TYPES
TEXT STYLES
No special form design
No constraining boxes or combs
Condensed strings
Cursive
Dirty Noisy forms
Bad quality paper
Legacy Forms
Bad quality machine print
Unconstrained Handprint
Specially designed for automatic recognition
Constrained Handprint
Constraining boxes or combs
Drop out ink for preprinted text boxes
Machine Print
Intelligent Recognition
OCR
ICR
TECHNOLOGY EVOLUTION
Illustration Conference on Technology Options
for 2011 Census
20
Major Commercial Suppliers
  • Top Image Systems (TIS) (http//www.topimagesystem
    s.com)
  • ReadSoft (http//www.readsoft.com)
  • Teleform (http//www.intelliscan.com/TeleForm1.htm
    )
  • Scanner Suppliers
  • Fujitsu, Canon, Bell Howell, Kodak

21
THANK YOU!
Write a Comment
User Comments (0)
About PowerShow.com