From Anthrax to ZIP Codes- The Handwriting is on the Wall - PowerPoint PPT Presentation

About This Presentation
Title:

From Anthrax to ZIP Codes- The Handwriting is on the Wall

Description:

From Anthrax to ZIP Codes-The Handwriting is on the Wall Venu Govindaraju Dept. of Computer Science & Engineering University at Buffalo venu_at_cedar.buffalo.edu – PowerPoint PPT presentation

Number of Views:162
Avg rating:3.0/5.0
Slides: 53
Provided by: govi1179
Category:

less

Transcript and Presenter's Notes

Title: From Anthrax to ZIP Codes- The Handwriting is on the Wall


1
From Anthrax to ZIP Codes-The Handwriting
is on the Wall
  • Venu Govindaraju
  • Dept. of Computer Science Engineering
  • University at Buffalo
  • venu_at_cedar.buffalo.edu

2
Outline
  • Success in Postal Application
  • Role of Handwriting Recognition
  • Recognition Models
  • Interactive Cognitive Models
  • New Research Areas
  • Other Applications

3
USPS HWAI Background
  • Postal Sponsorship Started 1984
  • 370 Academic Articles Published
  • Millions of Letters Examined
  • Many Experimental Systems Built and Tested
  • Migrated from Hardware to Software System
  • Only Postal Research Continuously Funded

4
Pattern Recognition Tasks
  • Items to be Recognized, Read, and Evaluated
    (Machine printed and Script)
  • Delivery address, senders address, endorsements
  • Linear Codes, Mail Class
  • Indicia (2D-Codes, Meter Marks)

5
Deployed..
  • USA
  • 250 PDC sites
  • 27 Remote Encoding Centers
  • 25 Billion Images Processed Annually
  • 89 Automated Bar-coding
  • UK
  • 67 Processing Centers
  • 27 Million Pieces Per Day,
  • 9.7 Million Pieces Per Hour Peak
  • Australia

6
RCR Overview
7
At the Right Price
  • Processing Type Cost/1000 Pieces
  • Manual 47.78
  • Mechanized 27.46
  • Automated 5.30

8
80 encode rate and counting!
9
Impact
  • Applications of CEDAR research helping to
    automate tasks at IRS and USPS
  • 1st year that USPS used CEDAR-developed software
    to read handwritten addresses on envelopes, saved
    100 million
  • 1997-1999 USPS deployment of CEDAR-developed
    RCRs, USPS saved 12 million work hours and over
    340 million
  • 500 scientific publications and 10 patents

10
Outline
  • Success in Postal Application
  • Role of Handwriting Recognition
  • Recognition Models
  • Interactive Cognitive Models
  • New Research Areas
  • Other Applications

11
Role Handwriting Recognition in Address
Interpretation
12
Context Provided by Postal Directories
  • ltZIP Code, Primary Numbergt
  • Create street name lexiconlt06478, 110gt
  • DPF yields 8 street names
  • ZIP4 yields 31 street names (on average about 5
    times more)
  • HAWLEY RD 1034NEWGATE RD 1533BEE
    MOUNTAIN RD 1615DORMAN RD 1642BOWERS
    HILL RD 1757FREEMAN RD 1781PUNKUP RD 1784
    PARK RD 6124

13
CEDAR
Context
  • One record per delivery point in USA
  • Provided weekly by USPS, San Mateo
  • Raw DPF
  • 138 million records
  • 15 GB (114 bytes per record)
  • 41,889 ZIP Code files
  • Fields of interest to HWAI
  • ZIP Code, street name, primary number, secondary
    number, add-on

14
CEDAR
Power of Context
  • ZIP Code
  • 30 of ZIP Codes contain a single street name
  • 5 of ZIP Codes contain a single primary number
  • 2 of ZIP Codes contain a single add-on
  • ltZIP Code, primary numbergt
  • Maximum number of records returned is 3,071
  • ltZIP Code, add-ongt
  • Maximum number of records returned is 3,070

15
Outline
  • Success in Postal Application
  • Role of Handwriting Recognition
  • Recognition Models
  • Interactive Cognitive Models
  • New Research Areas
  • Other Applications

16
Handwriting Recognition
Ranked Lexicon
Context
17
Multiple Choice Question
Ranked Lexicon
Context
18
Lexicon Driven Model
Distance between lexicon entry word first
character w and the image between - segments 1
and 4 is 5.0 - segments 1 and 3 is 7.2 - segments
1 and 2 is 7.6
Find the best way of accounting for characters
w, o, r, d buy consuming all segments 1
to 8 in the process
19
Lexicon Free Model
  • Image from 1 to 3 is a in with 0.5 confidence
  • Image from segment 1 to 4 is a w with 0.7
    confidence
  • Image from segment 1 to 5 is a w with 0.6
    confidence and an m with 0.3 confidence

w.6, m.3
w.7
d.8
o.5
u.5, v.2
i.8, l.8
i.7
r.4
u.3
m.2
m.1
Find the best path in graph from segment 1 to 8 w
o r d
20
Holistic Features
Reference Lines
Slant Norm
Turn Points
Ascender
Position Grid and gaps
Descender
21
Lexicon Reduction and Verification
22
Outline
  • Success in Postal Application
  • Role of Handwriting Recognition
  • Recognition Models
  • Interactive Cognitive Models
  • New Research Areas
  • Other Applications

23
Grapheme Models
24
Structural FeaturesBAG
Loops
End
Junction
End
Loop
Turns
25
Feature Extraction and Ordering
Critical node removal disconnects a connected
component.
Loops
End
End
Turns
Junction
Turns
Loop
2-degree critical nodes keep feature ordering
from left to right.
Right Component
Left Component
26
Continuous Attributes
grapheme pos orientation angle
Down cusp 3.0 -90o
Up loop
Down arc
27
Stochastic Model
28
Observations
29
Results
Lex size Top WMR SM CA
10 1 96.86 96.56
2 98.80 98.77
100 1 91.36 89.12
2 95.30 94.06
1000 1 79.58 75.38
2 88.29 86.29
20000 1 62.43 58.14
2 71.07 66.49
30
Interactive Models McClelland and Rumelhart,
Psychological Review, 1981
ABLE
TRIP
TRAP
Words
A
T
N
Letters
Features
31
Interactive Recognition
Lexicon 1 Lexicon 2 Lexicon 3
West Central StreetWest Main StreetSunset
Avenue
West Central StreetEast Central StreetSunset
Avenue
West Central StreetWest Central AvenueSunset
Avenue
Interactive Model
features
T-crossings, loops, ascenders, descenders, length
image
32
Adaptive Character Recognition Park and
Govindaraju, IEEE CVPR 2000
  • Adaptive selection of features
  • Adaptive number of features
  • Adaptive resolutions
  • Adaptive sequencing of features
  • Adaptive termination conditions

33
Features
4 gradient features
5 moment features
Vector code book
34
Feature Space
  • V x Nc x Ixy
  • 29 x 10 x 85 (quad tree, 4 levels)
  • Recognition rate and feature V
  • GSC V 2512
  • Tradeoffs space vs accuracy
  • Hierarchical space with additional resolution and
    features as needed

35
Active Recognition Using Quad Trees
36
Experimental Results
37
(No Transcript)
38
Results
Classifier Active Model Neural Net KNN
Top 1 95.7 96.4 95.7
Templates 612 976 3,777
Msec/char 1.45 11.5 384
Training hrs 1 24 1
25656 training and 12242 test (Postal NIST)
39
Outline
  • Success in Postal Application
  • Role of Handwriting Recognition
  • Recognition Models
  • Interactive Cognitive Models
  • New Research Areas
  • Other Applications

40
Fast Recognition
-Reuse matched characters -Reuse matched
sub-strings -Parallel processing
41
Combination and Dynamic Selection Govindaraju
and Ianakiev, MCS 2000
image
WR 1
WR 3

1
Top 50
Lexicon
lt55
WR 2
Top 5
  • Optimization problem
  • Combinatorial explosion in
  • arrangement of recognizers
  • lexicon reduction levels

42
Lexicon Density Govindaraju, Slavik, and Xue,
IEEE PAMI 2002
Lexicon 1 Lexicon 2 Me MeHe MemoSo MemoryTo
MemoirsIn Mellon
43
Classifier Performance Prediction Xue and
Govindaraju, IEEE PAMI 2002
q probability that recognizer make a unit
distance errors D average distance between any
two words in the lexicons n lexicon size p
performance a, k, model parameters ln (-ln p)
(ln q) D a ln ln n ln k
44
Outline
  • Success in Postal Application
  • Role of Handwriting Recognition
  • Recognition Models
  • Interactive Cognitive Models
  • New Research Areas
  • Other Applications

45
Bank Check Recognition
46
PCR Trend Analysis
47
NYS EMS PCR Form
  • NYS PCR Example
  • Thousands are filed a day.
  • Passed from EMS to Hospital.
  • PCR Purpose
  • Medical care/diagnosis
  • Legal Documentation
  • Quality Assurance
  • EMS Abbreviations
  • COPD Chronic Obstructive Pulmonary Disease
  • CHF Congestive Heart Failure
  • D/S Dextrose in Saline
  • PID Pelvic Inflammatory Disease
  • GSW Gunshot Wound
  • NKA No known allergies
  • KVO Keep vein open
  • NaCL Sodium Chloride

48
Medical Text Recognition and Data Mining
49
Reading Census Forms
Lexicon Anomalies Space sales man and
salesman Morphology acct manager and
account management Abbreviation Plural
school and schools Typographical managar
and manager
50
Binarization
51
Historic Manuscripts
52
Summary
  • Handwriting recognition technology
  • Pattern recognition task
  • Lexicon holds domain specific knowledge
  • Adaptive methods
  • Classifier combination methods
  • Many applications
Write a Comment
User Comments (0)
About PowerShow.com