Title: Combining Shape and Physical Models for Online Cursive Handwriting Synthesis
1Combining Shape and Physical Models for Online
Cursive Handwriting Synthesis
- Jue Wang (University of
Washington)Chenyu Wu (Carnegie Mellon
University)Ying-Qing Xu (Microsoft
Research Asia)Heung-Yeung Shum (Microsoft
Research Asia)
International Journal on Document Analysis and
Recognition (IJDAR) 2004
2Introduction
- Handwriting computing techniques (pen-based
devices) - Handwriting recognition
- make it possible for computers to understand the
information involved in handwriting - Handwriting modulation
- handwriting editing, error correction, script
searching
3Introduction
- Handwriting Modeling Synthesis
- Movement-simulation techniques
- base on motor models and try to model the process
of handwriting production - focus on the representation and analysis of real
handwriting signals rather than handwriting
synthesis
4Introduction
- Shape-simulation methods
- consider the static shape of handwriting
trajectory - more practical than movement-simulation tech
when dynamic information is not available - straight forward approach synthesize form
collected handwritten glyphs - learning-based cursive handwriting synthesis
approach
5Introduction
- Successful handwriting synthesis algorithm
- shapes of letters vs. training samples
- connection between synthesized letters
- A novel cursive handwriting synthesis tech
- Combine the advantages of the shape-simulation
and the movement-simulation methods
6Outline
- Sample collection and segmentation
- Learning strategies
- Synthesis Strategies
- Experimental results
- Discussion and Conclusion
7Sample Collection
- About 200 words
- Each letter has appeared more than 5 times
- These handwriting samples firstly pass through a
low pass filter and then be re-sampled to
produce equidistant points
8Sample Segmentation
- Overview
- Segmentation-based recognition method
- Recognition-based segmentation
- (rely heavily on the performance of the
recognition engine) - Level-building
- simultaneously outputs the recognition and
segmentation results - segmentation and recognition are merged to give
an optimal result
9A Two-level Framework
- Framework of traditional handwriting segmentation
approaches - Temporal handwriting sequence
- is a low level feature that denotes the
coordinate and velocity of the sequence at time t
10Segmentation
- The segmentation problem is to find the identity
string I1,,In, with the corresponding segments
of the sequence S1,,Sn, S1 z1,,zt1,,
Snztn-1,, zT,that best explain the sequence
11Segmentation
- For the training of the writer-independent
segmentation system - low-level feature-based segmentation algorithm
works well for a small number of writers - A script code is calculated from handwriting data
as the middle-level feature
12Middle Level Feature
- Five kinds of key points are extracted
- points of maximum/minimum x-coordinate (X,X-)
- points of maximum/minimum y-coordinate (Y,Y-)
- crossing points ( )
- Average direction of the interval sequence
between two adjacent key points -
13Script Codes Examples
14Middle Level Feature
- Samples of each character are divided into
several clusters - those in the same cluster have a similar
structural topology - Since the length of script code might not be the
same in all cases ? cant directly compute the
similarity - The script code is modeled as a homogeneous
Markov chain
15Middle Level Feature
- Given two script codes T1, T2
- We may compute the stationary distributions ,
and transition matrix A1, A2 - The similarity between two script codes is
measured as
16Middle Level Feature
- The position of , , A1, A2 are enforced
symmetrically - balance the variance of the KL
divergence and the difference in code length - If both the stationary distribution and the
transition matrix of two script codes are matched
well, and their code lengths are almost the same
? d(T1, T2) is close to 1
17Segmentation
- After introducing the script code as middle-level
features, the optimization problem becomes - improve the accuracy of segmentation
- dramatically reduce the computational complexity
of level-building
18Graph Model
19Result
20Outline
- Sample collection and segmentation
- Learning strategies
- Synthesis Strategies
- Experimental results
- Discussion and Conclusion
21Learning Strategies
- Data alignment
- Trajectory matching
- Training set alignment
- Shape models
22Trajectory Matching
- Segmentation and reconstruction of on-line
handwritten scripts (1998, Pattern Recognition)
Each piece is simple arc, points can be
equidistantly sampled from it to represent the
stroke
23Trajectory Matching
- Landmark-point-extraction method
- pen-down, pen-up points
- local extrema of curvature
- inflection points of curvature
- A handwriting sample can be divided into as many
as six pieces - The same character are mostly composed of the
same number of pieces and they match each other
naturally
24Trajectory Matching
- A handwriting sample can be represented by a
point vector - s number of static pieces segmented from the
sample - ni number of points extracted from the i th piece
25Trajectory Matching
- The following is to align different vector into a
common coordinate frame - estimate an affine transform for each samplethat
transforms the sample into the coordinate frame - Affine transformations translation, rotation,
scaling
26Training Set Alignment
- Iterative algorithm(Learning from one example
through shared densities on transforms (IEEE CVPR
2000) ) - Deformable energy based criterion is defined as
27Training Set Alignment - Algorithm
- Maintain an affine transform matrix Ui for each
sample, which is set to identity initially - Compute the deformable energy-based criterion E
- Repeat until convergence
- For each one of the six unit affine matrixes14,
Aj, j 1,,6 - Let
- Apply to the sample and recalculate the
criterion E - If E has been reduced, accept ,
otherwise - Let and apply
again,If E has been reduce, accept ,
otherwise revert to Ui - End
28Shape Models
- By modeling the distribution of aligned vectors,
new examples can be generated that are similar to
those in the training set - Like the Active Shape Model, principal component
analysis is applied to the data
(PCA)(Statistical models of appearance for
computer vision, Draft report, 2000)
29Shape Model
- Formally, the covariance of the data is
calculated as - Then the eigenvectors and corresponding
eigenvalues of S are computed and sorted so
that - The training set is approximated by
- represent the
t eigenvectors corresponding to the largest
eigenvalues - b is a vt-dimensional vector given by
- By varying the elements in b, new handwriting
trajectory can be generated from this model - apply limits of to the elements bi
30Outline
- Sample collection and segmentation
- Learning strategies
- Synthesis Strategies
- Experimental results
- Discussion and Conclusion
31Synthesis Strategies
- Generate each individual letter in the word
- Then the baselines of these letters are aligned
and juxtaposed in a sequence - Concatenate letters with their neighbors to form
a cursive handwriting - ?cant be easily achieved
- To solve this problem, a delta log-normal model
based conditional sampling algorithm is proposed
32Individual Letter Synthesis
33Delta Log-normal Model
- A powerful tool in analyzing rapid human
movements - With respect to handwriting generation, the
movement of a simple stroke is controlled by
velocity - The magnitude of the velocity is described
as(Why handwriting segmentation can be
misleading?, 13th international conference on PR,
1996)
log-normal function (on a logarithmic scale axis)
34Delta Log-normal Model
- The angular velocity can be expressed as
- The angular velocity is calculated as the
derivative of - Give , the curvature along a stroke piece
is calculated as - The static shape of the piece is an arc,
characterized by
initial directionc0 constant
(arc length)
35Delta Log-normal Model-Example
Why Handwriting Segmentation Can Be Misleading,
1996 IEEE ICPR
36Conditional Sampling
- First, the trajectories of synthesized
handwriting letters are decomposed into static
pieces - The first piece of a trajectory is called head
piece, and the last piece is called the tail
piece -
- In the concatenation process, the trajectories of
letters will be deformed to produce a natural
cursive handwriting,by changing the parameters
of the head and the tail pieces from
37Conditional Sampling
- A deformation energy of a stroke is defined as
- A concatenation energy between the i th letter
and the (i1) th letter is defined as - By minimizing the second and the third items, the
two letters are forced to connect with each other
smoothly and naturally
38Conditional Sampling
- The concatenation energy of a whole word is
calculated as - We must ensure that the deformed letters are
consistent with models - The sampling energy is calculated as
- The whole energy formulation is finally given as
39Synthesis-Iterative Approach
- Randomly generate a vector b(i) for each letter
initially - Generate trajectories Si of letters and calculate
an affine transform Ti for each letter (transform
it to its desired position) - For each pair of adjacent letters Si, Si1,
deform the pieces in these letters to minimize
the concatenation energy Ec(i, i1) - Project the deformed shape into the model
coordinate frame - Update the model parameters
- If not converged return to step 2
40Experimental Results
41Discussion Conclusion
- Performance is limited by samples used for
training since the shape models can only generate
novel shapes within the variation of training
samples - Although some experimental results are shown, it
is still not known how to make an objective
evaluation on the synthesized scripts and compare
different synthesis approaches
42Markov chains
- Markov chain on a space X with transitions T is a
random process (infinite sequence of random
variables) (x(0), x(1),x(t),) that satisfy - That is, the probability of being in a particular
state at time t given the state history depends
only on the state at time t-1 - If the transition probabilities are fixed for all
t, the chain is considered homogeneous
43Stationary distribution
- Consider the Markov chain given above
- The stationary distribution is