From Anthrax to ZIP Codes The Handwriting is on the Wall - PowerPoint PPT Presentation

About This Presentation
Title:

From Anthrax to ZIP Codes The Handwriting is on the Wall

Description:

From Anthrax to ZIP Codes- The Handwriting is on the Wall. Venu Govindaraju ... ZIP Code, add-on Maximum number of records returned is 3,070. Relevant Statistics ... – PowerPoint PPT presentation

Number of Views:98
Avg rating:3.0/5.0
Slides: 51
Provided by: gov55
Category:
Tags: zip | anthrax | codes | handwriting | wall | zip

less

Transcript and Presenter's Notes

Title: From Anthrax to ZIP Codes The Handwriting is on the Wall


1
From Anthrax to ZIP Codes-The Handwriting
is on the Wall
  • Venu Govindaraju
  • Dept. of Computer Science Engineering
  • University at Buffalo
  • Venu_at_cedar.buffalo.edu

2
Outline
  • Success in Postal Application
  • Role of Handwritten Word Recognition
  • Word Recognition
  • Lexicon Driven Word Recognition
  • Lexicon Free Word Recognition
  • New Models
  • Interactive Cognitive Models
  • New Research Areas
  • Lexicon density
  • Lexicon Reduction and Combination
  • Other Applications

3
USPS HWAI Background
  • Postal Sponsorship Started 1984
  • 370 Academic Articles Published
  • Millions of Letters Examined
  • Many Experimental Systems Built and Tested
  • Migrated from Hardware to Software System
  • Only Postal Research Continuously Funded

4
Pattern Recognition Tasks
  • Items to be Recognized, Read, and Evaluated
    (Machine printed and Script)
  • Delivery address, senders address, endorsements
  • Linear Codes, Mail Class
  • Indicia (2D-Codes, Meter Marks)

5
Deployed..
  • USA
  • 250 PDC sites
  • 27 Remote Encoding Centers
  • 25 Billion Images Processed Annually
  • 89 Automated Bar-coding
  • UK
  • 67 Processing Centers
  • 27 Million Pieces Per Day,
  • 9.7 Million Pieces Per Hour Peak
  • Australia

6
Scope - Others
  • Royal Mail
  • 67 Processing Centers
  • 27 Million Pieces Per Day
  • 9.7 Million Pieces Per Hour Peak
  • Australia Post
  • Similar to Royal Mail

7
(No Transcript)
8
RCR Overview
9
The Right Technology
  • Technological Nexus
  • Sophisticated Algorithms
  • High Speed Processors
  • Large Disk Capacities
  • High Speed Memories

10
At the Right Price
  • Processing Type Cost/1000 Pieces
  • Manual 47.78
  • Mechanized 27.46
  • Automated 5.30

11
80 encode rate and counting!
12
Impact
  • Applications of CEDAR research helping to
    automate tasks at IRS and USPS
  • 1st year that USPS used CEDAR-developed software
    to read handwritten addresses on envelopes, saved
    100 million
  • 1997-1999 USPS deployment of CEDAR-developed
    RCRs, USPS saved 12 million work hours and over
    340 million
  • 500 scientific publications and 10 patents

13
Outline
  • Success in Postal Application
  • Role of Handwritten Word Recognition
  • Word Recognition
  • Lexicon Driven Word Recognition
  • Lexicon Free Word Recognition
  • New Models
  • Interactive Cognitive Models
  • New Research Areas
  • Lexicon density
  • Lexicon Reduction and Combination
  • Other Applications

14
(No Transcript)
15
Handwritten Address Interpretation (HWAI)
Chaincode Generation
Pre-scan with Digit Recognizer
Line Segmentation
Word Separation
Parsing a) shape b) syntax
Digit String Recognition
Address Block Image
Input
Yes
Phrase Recognition
Encoding Strategy
Database Queries
Output
Finalized?
14221 3851 11
No
Adaptive Image Enhancement
5, 9, or 11 digit encode OR reject
Pass 1 or Pass 2
Pass 1
Pass 2
Output
16
Context Provided by Postal Directories
  • Create street name lexicon
  • DPF yields 8 street names
  • ZIP4 yields 31 street names (on average about 5
    times more)
  • HAWLEY RD 1034NEWGATE RD 1533BEE
    MOUNTAIN RD 1615DORMAN RD 1642BOWERS
    HILL RD 1757FREEMAN RD 1781PUNKUP RD 1784
    PARK RD 6124

17
CEDAR
Delivery Point File
  • One record per delivery point in USA
  • Provided weekly by USPS, San Mateo
  • Raw DPF
  • 138 million records
  • 15 GB (114 bytes per record)
  • 41,889 ZIP Code files
  • Fields of interest to HWAI
  • ZIP Code, record type (eg., street, firm, PO Box
    ..), street name, primary number, secondary
    number, add-on

18
CEDAR
Relevant Statistics
  • ZIP Code
  • 30 of ZIP Codes contain a single street name
  • 5 of ZIP Codes contain a single primary number
  • 2 of ZIP Codes contain a single add-on
  • Maximum number of records returned is 3,071
  • Maximum number of records returned is 3,070

19
Outline
  • Success in Postal Application
  • Role of Handwritten Word Recognition
  • Word Recognition
  • Lexicon Driven Word Recognition
  • Lexicon Free Word Recognition
  • New Models
  • Interactive Cognitive Models
  • New Research Areas
  • Lexicon density
  • Lexicon Reduction and Combination
  • Other Applications

20
Handwriting Recognition
Bryant 2.3 Boston 1.8 Bidwell 2.6 James
4.7 Buffalo 8.9
Word Recognition Engine
Signal
BostonBuffaloWilliamsvilleBidwellJamesByrant
....
Context Lexicon
Ranked lexicon with distance scores
21
WMR
Distance between lexicon entry word first
character w and the image between - segments 1
and 4 is 5.0 - segments 1 and 3 is 7.2 - segments
1 and 2 is 7.6
Find the best way of accounting for characters
w, o, r, d buy consuming all segments 1
to 8 in the process
22
CMR
  • Image from 1 to 3 is a in with 0.5 confidence
  • Image from segment 1 to 4 is a w with 0.7
    confidence
  • Image from segment 1 to 5 is a w with 0.6
    confidence and an m with 0.3 confidence

w.6, m.3
w.7
d.8
o.5
u.5, v.2
i.8, l.8
i.7
r.4
u.3
m.2
m.1
Find the best path in graph from segment 1 to 8 w
o r d
23
Outline
  • Success in postal application
  • Role of Handwritten Word Recognition
  • Word Recognition
  • Lexicon Driven Word Recognition
  • Lexicon Free Word Recognition
  • New Models
  • Interactive Cognitive Models
  • New Research Areas
  • Lexicon density
  • Lexicon Reduction and Combination
  • Other Applications

24
Multiple Choice Paradigm
  • Amherst b) Buffalo c) Boston
  • d) None of the above

25
Grapheme Models
26
Stochastic Models and Continuous Attributes
27
Results
28
Interactive Models McClelland and Rumelhart,
Psychological Review, 1981
ABLE
TRIP
TRAP
Words
A
T
N
Letters
Features
29
Cognitive Handwritten Word Recognition
Lexicon 1 Lexicon 2 Lexicon 3
West Central StreetWest Main StreetSunset
Avenue
West Central StreetEast Central StreetSunset
Avenue
West Central StreetWest Central AvenueSunset
Avenue
Interactive Model
features
T-crossings, loops, ascenders, descenders, length
image
30
Adaptive Character Recognition Park and
Govindaraju, IEEE CVPR 2000
  • Adaptive selection of features
  • Adaptive number of features
  • Adaptive resolutions
  • Adaptive sequencing of features
  • Adaptive termination conditions

31
Features
4 gradient features
5 moment features
Vector code book
32
Feature Space
  • V x Nc x Ixy
  • 29 x 10 x 85 (quad tree, 4 levels)
  • Recognition rate and feature V
  • GSC V 2512
  • Tradeoffs space vs accuracy
  • Hierarchical space with additional resolution and
    features as needed

33
Active Recognition Using Quad Trees
34
Experimental Results
35
(No Transcript)
36
Results
10 class digit recognition 25656 training and
12242 test (Postal NIST)
37
Outline
  • Success in Postal Application
  • Role of Handwritten Word Recognition
  • Word Recognition
  • Lexicon Driven Word Recognition
  • Lexicon Free Word Recognition
  • New Models
  • Interactive Cognitive Models
  • New Research Areas
  • Lexicon Reduction and Combination
  • Lexicon Density and Prediction of Performance
  • Other Applications

38
Combination and Dynamic Selection Govindaraju
and Ianakiev, MCS 2000
image
WR 1
WR 3

1
Top 50
Lexicon

WR 2
Top 5
  • Optimization problem
  • Combinatorial explosion in
  • arrangement of recognizers
  • lexicon reduction levels

39
Lexicon Density Govindaraju, Slavik, and Xue,
IEEE PAMI 2002
Lexicon 1 Lexicon 2 Me MeHe MemoSo MemoryTo
MemoirsIn Mellon
40
Classifier Performance Prediction Xue and
Govindaraju, IEEE PAMI 2002
q probability that recognizer make a unit
distance errors D average distance between any
two words in the lexicons n lexicon size p
performance a, k, model parameters ln (-ln p)
(ln q) D a ln ln n ln k
41
Outline
  • Success in Postal Application
  • Role of Handwritten Word Recognition
  • Word Recognition
  • Lexicon Driven Word Recognition
  • Lexicon Free Word Recognition
  • New Models
  • Interactive Cognitive Models
  • New Research Areas
  • Lexicon density
  • Lexicon Reduction and Combination
  • Other Applications

42
Bank Check Recognition
43
PCR Trend Analysis
44
NYS EMS PCR Form
  • NYS PCR Example
  • Thousands are filed a day.
  • Passed from EMS to Hospital.
  • PCR Purpose
  • Medical care/diagnosis
  • Legal Documentation
  • Quality Assurance
  • EMS Abbreviations
  • COPD Chronic Obstructive Pulmonary Disease
  • CHF Congestive Heart Failure
  • D/S Dextrose in Saline
  • PID Pelvic Inflammatory Disease
  • GSW Gunshot Wound
  • NKA No known allergies
  • KVO Keep vein open
  • NaCL Sodium Chloride

45
Medical Text Recognition and Data Mining
46
Reading Census Forms
Lexicon Anomalies Space sales man and
salesman Morphology acct manager and
account management Abbreviation Plural
school and schools Typographical managar
and manager
47
Binarization
48
Historic Manuscripts
49
Mapping Snippets with Transcribed Text
50
Summary
  • Handwriting recognition technology
  • Pattern recognition task
  • Lexicon holds domain specific knowledge
  • Adaptive methods
  • Classifier combination methods
  • Many applications
Write a Comment
User Comments (0)
About PowerShow.com