Handwritten Character Recognition using Hidden Markov Models - PowerPoint PPT Presentation

About This Presentation
Title:

Handwritten Character Recognition using Hidden Markov Models

Description:

Handwritten Character Recognition using Hidden Markov Models Quantifying the marginal benefit of exploiting correlations between adjacent characters and words – PowerPoint PPT presentation

Number of Views:1131
Avg rating:3.0/5.0
Slides: 10
Provided by: ekr3
Learn more at: http://www.cs.cmu.edu
Category:

less

Transcript and Presenter's Notes

Title: Handwritten Character Recognition using Hidden Markov Models


1
Handwritten Character Recognition using Hidden
Markov Models
  • Quantifying the marginal benefit of exploiting
    correlations between adjacent characters and words

2
Optical Character Recognition
  • Rich field of research with many applicable
    domains
  • Off-line vs. On-line (includes time-sequence
    info)
  • Handwritten vs. Typed
  • Cursive vs. Hand-printed
  • Cooperative vs. Random Writers
  • Language-specific differences of grammar and
    dictionary size
  • We focus on off-line mixed-modal English data set
    with mostly handwritten and some cursive data
  • Observation is monochrome bitmap representation
    of each letter with segmentation problem already
    solved for us (but poorly)
  • Pre-processing of dataset for noise filtering and
    normalizations of scale also assumed done

3
Common Approaches to OCR
  • Statistical Grammar Rules and Dictionaries
  • Feature Extraction of observations
  • Global features Moments and invariants of image
    (e.g., percentage of pixels in certain region,
    measuring curvature)
  • Local features Group windows around image pixels
  • Hidden Markov Models
  • Used mostly in cursive domain for easy training
    and to avoid segmentation issues
  • Most HMMs use very large models with words as
    states, combined with above approaches, which is
    more applicable to domains of small dictionary
    size with other restrictions

4
Visualizing the Dataset
  • Data Collected from 159 subjects with varying
    styles, printed and cursive
  • Missing first letter of each word to simplify
    capital letters
  • Each character represented by 16x8 array of bits
  • Character meta-data includes correct labels and
    end-of-word boundaries
  • Pre-processed into 10 cross-validation folds

5
Our Approach HMMs
  • Primary Goal Quantify the impact of
    correlations between adjacent letters and words
  • Secondary Goal Learn an accurate classifier for
    our data set
  • Our Approach Use a HMM and compare to other
    algorithms
  • 26 states of HMM each represent letter of
    alphabet
  • Supervised learning of model with labeled data
  • Prior probabilities and transition matrix learned
    by frequency of letters in training
  • Learning algorithm for emission probabilities
    uses Naive Bayes assumption (i.e., pixels
    conditionally independent given the letter)
  • Viterbi algorithm predicts most probable sequence
    of states given the observed character pixel maps

6
Algorithms and Optimizations
  • Learning algorithms implemented and tested
  • Baseline Algorithm Naïve Bayes Classifier (no
    HMM)
  • Algorithm 2 NB with maximum probable
    classification over a set of shifted observations
  • Motivation was to compensate for correlations
    between adjacent pixels not included in Naïve
    Bayes assumption
  • Algorithm 3 HMM with NB assumption
  • Fix for incomplete data Examples hallucinated
    prior to training
  • Algorithm 4 Optimized HMM with NB assumption
  • Ignore effects of inter-word transitions when
    learning HMM
  • Algorithm 5 Dictionary Creation and Lookup with
    NB assumption (no HMM)
  • Geared toward specific data set with small
    dictionary size, but less generalizable to more
    constrained data sets with larger dictionaries

7
Alternative Algorithms and Experimental Setup
  • Other variants considered but not implemented
  • Joint Bayes parameter estimation (too many
    probabilities to learn, 2128 vs. 3,328)
  • HMM with 2nd-order Markov assumption (exponential
    in number of Viterbi paths)
  • Training Naïve Bayes over a set of shifted and
    overlayed observations (preprocessing to create
    thicker boundary)
  • All experiments run with 10-fold cross-validation
  • Results given as averages with standard deviations

8
Experimental Results
9
Conclusions
  • Naïve Bayes classifier did pretty good on its own
    (62.7 accuracy - 15x better than random
    classifier!)
  • Classification on shifted data did worse since we
    lost data on edges!
  • Small dictionary size of dataset affected
    results
  • Optimized HMM w/ NB achieves 71 accuracy
  • Optimizations only marginally significant because
    of dataset
  • More simple and flexible approach for achieving
    impressive results on other datasets
  • Dictionary approach is almost perfect with 99.3
    accuracy!
  • Demonstrates additional benefit of exploiting
    domain constraints, grammatical or syntactic
    rules
  • Not always feasible dictionary may be unknown,
    too large, or the data may not be predictable
Write a Comment
User Comments (0)
About PowerShow.com