Style and Adaptation - PowerPoint PPT Presentation

1 / 66
About This Presentation
Title:

Style and Adaptation

Description:

Style and Adaptation. Harsha Veeramachaneni. Automated Reasoning Systems ... Seismology. Age/gender of writer. Text classification. Land quality, local weather ... – PowerPoint PPT presentation

Number of Views:25
Avg rating:3.0/5.0
Slides: 67
Provided by: rpi99
Category:

less

Transcript and Presenter's Notes

Title: Style and Adaptation


1
Style and Adaptation
  • Harsha Veeramachaneni
  • Automated Reasoning Systems Division
  • IRST, Trento, Italy

2
Outline
  • Introduction
  • Style-conscious field classification
  • Model for continuously distributed styles
  • Adaptation
  • Conclusion

3
Terminology
  • The field has a common source

4
Context
  • Context Inter-pattern statistical dependence in
  • pattern fields
  • Linguistic context Class dependence

? (statistically more likely)
5
Style context
6
Example multiple writers
7
Style context
  • Style context A type of Inter-pattern feature
    dependence
  • i.e., in a field the renderings of singlet
    patterns in the field are not independent of one
    another
  • Present due to multiplicity of dissimilar sources
  • Can be exploited to reduce inter-source confusions

8
Styles in other domains
9
Utilizing style context
10
Next
  • Introduction
  • Style-conscious field classification
  • Model for continuously distributed styles
  • Adaptation
  • Application
  • Conclusion

11
Desirable properties of style-conscious
classifiers
  • Should be accurately trainable with readily
    available training data with source and class
    labels
  • But labeled training fields of all lengths are
    difficult to obtain
  • Should approach the average intra-style accuracy
    asymptotically with field length
  • Should be computationally feasible

12
Main assumptions
  • Test fields have style-consistency but source
    identity unknown
  • The context in the class labels of the patterns
    is independent of the source
  • Within a source the singlet patterns in a field
    are independent of one another (given field-class)

13
Style-conscious classification
Training samples for all 1010 field-classes
Training
Choose one of 1010 field-classes
Field classification result
518 276 9999
14
Method 2 Font identification
Training samples for all 10 classes for all fonts
Training
Font-specific singlet classifiers

Field classification result
518 276 9999
Font recognition
15
Method 3 Style conscious quadratic field
classifier
  • Two singlet classes A and B
  • Two writers Sam and Joe
  • One feature per singlet

16
Terminology cont
Test field written by one of the writers
Feature extraction
x1
x2
Singlet feature vectors
x1 x2
y
Field feature vector
17
Field feature distributions
18
(No Transcript)
19
Quadratic field classifier
Field-class mean
Field-class covariance matrix
20
Means and covariance matrices
  • Field-class means are singlet-class means
    concatenated
  • Diagonal blocks in field covariance matrices are
    singlet covariance matrices
  • Means and covariance matrices for any field
    length can be constructed from a few basic blocks

21
Cross-covariance matrices
  • The cross-covariance matrices depend only
  • on the source-conditional class means
  • Can be estimated accurately
  • Style context - due to variation of class means
    across sources

22
Next
  • Introduction
  • Style-conscious field classification
  • Model for continuously distributed styles
  • Adaptation
  • Application
  • Conclusion

23
Model for continuously distributed styles
Under such a model our quadratic field classifier
is optimal
24
A model for continuously distributed styles
  • A source is identified by its singlet class means
  • The sources are Gaussian distributed
  • The class means are correlated
  • (inter-class
    correlation)
  • The singlet covariance matrices are
  • style independent

25
Experimental results Handwritten numerals
  • Database NIST special database 19 (handwritten
    digits with writer labels)
  • 10 samples per digit per writer
  • Training set 494 writers 54193 samples
  • Test set 499 new writers 54481 samples
  • Features 100 directional chain-code features

26
Error rates style-conscious quadratic classifier
27
Provable Properties
  • Inter class style gt Intra class style
  • Asymptotic error rate (with field length)
    within style error rate
  • Order independent classification
  • Error rate as a function of field length can be
    bounded
  • for a two class problem

28
Example
29
A Breather
The hidden style variable s shadow / no shadow
E.H. Adelson http//web.mit.edu/persci/people/adel
son/checkershadow_illusion.html
30
Styles ?
The hidden style variable s shadow / no shadow
E.H. Adelson http//web.mit.edu/persci/people/adel
son/checkershadow_illusion.html
31
Next
  • Introduction
  • Style-conscious field classification
  • Model for continuously distributed styles
  • Adaptation
  • Application
  • Conclusion

32
Decision Directed Adaptation
  • Nagy, Shelton, Self Corrective Character
    Recognition System, IEEE Trans Info. Theory,
    April 1966

33
Decision Directed Adaptation

Training set
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
?
?
B
B
Classifier learnt on the training set
B
?
?
?
?
?
B
?
?
?
Feature 2
B
B
?
?
?
B
B
B
?
?
?
?
?
?
B
?
?
?
?
B
B
?
?
B
B
B
?
?
?
?
?
?
?
?
?
Test set
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
Feature 1
34
Decision Directed Adaptation
  • If the classifier is a minimum distance to
    centroid classifier ltgt k-means
  • Errors in the test set labels not uniform
  • Implies that if we have an infinite training set
    and infinite test set from the same distribution
    the classification still changes

35
Adaptation to exploit style context
  • We have a classifier trained on several styles
  • The test set is from one (unknown) style
  • The classifier adapts to the test style
  • by estimating the style parameters from the test
    set
  • Reclassifying the test set with the new
    parameters

36
Adaptation Only class means
  • Model the test data as mixture of Gaussians
    with unknown means
  • Assume that a priori class probabilities and
    class-conditional covariance matrices are known
  • EM algorithm for Maximum likelihood (ML)
    estimation of class means
  • But from the model for continuously
    distributed styles, we have a distribution of
    class means

37
Gaussian distributed styles
38
Adaptation Class means
  • We can use a Bayesian EM algorithm for Maximum A
    Posteriori (MAP) estimation of component means
    for a Gaussian mixture
  • When size of test set (field length) increases,
    MAP estimates become identical to ML estimates

39
How does MAP adaptation accomplish
style-conscious classification?
Training samples from many different writers
Training
classification result
Style-specific singlet classifier
Test set
MAP Estimation of class means
MAP adaptation is analogous to font-identification
, although for continuously distributed styles
40
Example
41
Decision boundaries in field feature space
Optimal field classifier (optimized for character
error rate)
Optimal field classifier (optimized for field
error rate)
42
Decision boundaries contd.
MAP adaptive classifier (inter class style
context)
MAP adaptive classifier (intra class style
context)
ML adaptive classifier
43
Some results on the MNIST data
Error rate without adaptation 2.12
Error rates for adaptive classification as a
function of field length
44
Small-sample adaptation Two types of
non-representative training sets
  • We adapt not only when the training set is
    different from the test set, but also when it is
    small.
  • Semi-supervised classification
  • Claim No different than style adaptation

45
Small-sample adaptation
  • We perform style constrained classification of
    the test set under the posterior distribution of
    all possible problems as styles.

46
Conclusions
  • Style context due to commonality of source of
    fields .
  • Modeling style context with second-order
    statistics
  • (pair wise correlations) allows efficient
    estimation of model parameters
  • Style-conscious classifiers approach
    style-specific accuracy with increasing field
    length.
  • Adaptation is another means for exploiting style
    context.
  • MAP adaptation can be viewed as style-first
    recognition.
  • Adaptation and semi-supervised classification can
    be studied under the umbrella of style.

47
Thank you
48
Adaptation by clustering
  • Labeling based on linguistic constraints
  • Give the clusters dummy labels and solve a
    substitution cipher
  • Labeling based on training set
  • Label each cluster by the closest class in
    the training set

49
Common method for adaptation Clustering
  • Cluster the patterns in the test data
  • Each cluster is a different class
  • Label the clusters either
  • Based on linguistic constraints
  • Based on proximity to training set

50
EM for clustering
  • Expectation-Maximization (EM) algorithm
  • Iterative method for parameter estimation
  • The test data is modeled as a mixture of
    parameterized distributions
  • We can use EM for mixture identification
  • (i.e., to estimate the parameters)
  • Traditionally, EM algorithm is used to obtain the
    maximum-likelihood estimates of the parameters

51
Example
52
Example
53
Adaptation and Style
  • What is the cause of non-representative training
    data for our problem?
  • Training data from large number of styles and
    test data from one style
  • Adaptation is another means to exploit style
    consistency

54
Adaptation and Style
55
Style-conscious classification - Recap
  • Short fields (L 2, 3, )
  • Few discrete styles
  • (e.g., printed text in a few different fonts)
  • Multi-modal field classifier (one mode per style)
  • Style conscious quadratic field classifier
  • Continuous styles
  • (e.g., different writers)
  • Style conscious quadratic field classifier
  • Long fields (L 10, 20, 30, ...)
  • Adaptation
  • ML (when styles are discrete or
  • when fields are long enough to make ML
    and MAP the same)
  • MAP

56
Next
  • Introduction
  • Style-conscious field classification
  • Model for continuously distributed styles
  • Adaptation
  • Application
  • Conclusion

57
Error rates ML adaptation
Singlet error rates Test fields 1 test writer
at a time
Error rate
Method
1.40
No adaptation
0.88
Mean adaptation
0.83
Mean Covariance adaptation
Out of 499 test writers 140 improved 30
worsened Max improvement for any writer 20
samples Max worsening
3 samples
58
Next
  • Introduction
  • Style-conscious field classification
  • Model for continuously distributed styles
  • Adaptation
  • Application
  • Conclusion

59
Directions for future research
  • Further computational improvements for the
    quadratic field classifier
  • Quantification of the amount of useful style
  • Methods for multiple initializations of the EM
    algorithm
  • Formal study of adaptation encompassing the EM
    algorithm (initialization and convergence to
    local optima)
  • Other application areas

60
Method 1 Decision-directed adaptation
  • Observation Some errors in the training set can
    be tolerated
  • Assume classifier trained on the training set
    has low error rate
  • Assume test set large enough
  • Method Classify the test set and retrain the
    classifier with the labeled test set
  • Disadvantages
  • Errors in the test set labels not uniform
  • No proof that it will improve accuracy
  • Training set not a fixed point

61
Bounded search algorithm
  • A fast implementation of the quadratic field
    classifier
  • Greatly reduces the number of field quadratic
    discriminant values that are computed for a field
    class
  • We show
  • The field discriminant value for a field-class
    can be bounded from below by a function of
    quantities computed by the quadratic singlet
    classifier

62
Bounded search algorithm
63
Styles in various classification problems
64
Experimental results Machine-printed numerals
  • Machine-printed numerals in five fonts
  • Training set 250 samples/digit/font 12500
    samples
  • Test set 250 samples/digit/font 12500 new
    samples
  • Features 64 directional chain-code features,
    of which 8 principal component features were used

65
Error counts style-conscious classification
Number of errors for 5 test fonts (out of 2,500
samples each)
T Times Roman V Verdana
A Avant Garde B Bookman Old Style H
Helvetica
66
Field reject-error trade-off style-conscious
quadratic classifier
Write a Comment
User Comments (0)
About PowerShow.com