ICA Are the strokes the independent components in Hindi handwriting - PowerPoint PPT Presentation

1 / 41
About This Presentation
Title:

ICA Are the strokes the independent components in Hindi handwriting

Description:

... main idea here is the optimization of a contrast function, such as the kurtosis. ... Consider the optimization of the kurtosis projection of a zero mean whitened ... – PowerPoint PPT presentation

Number of Views:333
Avg rating:3.0/5.0
Slides: 42
Provided by: cseIi
Category:

less

Transcript and Presenter's Notes

Title: ICA Are the strokes the independent components in Hindi handwriting


1
ICA - Are the strokes the independent components
in Hindi handwriting?
  • Rajesh Chepuri(Y3111036) crajesh_at_
  • Venkata Rao Chimata(Y3111052) venkatch_at_

2
Motivation
  • How to recognize handwritten hindi characters?
  • Different people different ways of writing each
    character.(Variation in size, thickness and
    style)
  • What are the natural features of hand-written
    characters and how to arrive at them
    automatically?

3
Different Mechods to Recognize Handwritten Hindi
Characters
  • Eigen Decompostion
  • Independent Component Analysis
  • Singular Value Decomposition
  • Non-Negative Matrix Factorization
  • Neural network
  • Hidden Markov Models

4
Overview of Each model
  • In each model X represents the set of characters
    in matrix representation.
  • Ref Character Decompositions, S H Srinivasan,
    K R Rama Krishnan, Suvrat Budlakoti

5
Eigen Decomposition
  • The covariance matrix of the data matrix is
    subjected to eigen value decomposition.
  • The covariance of x is defined as
  • C 1/nS(X-µ)(X- µ)T
  • Where µ 1/n Sxi
  • Where xi ,1ltiltn are data vectors
  • Look for bases which diagonalize C.
  • This is given by CUT?U. The columns of U
    correspond to the eigenvectors of C and the
    diagonal entries of ? correspond to the eigen
    values.
  • The Eigen Vectors are ordered according to their
    importance- as indicated by the associated Eigen
    Values.

6
Singular Value Decomposition
  • Generalization of Eigen Decomposition. (Eigen
    Decomposition is defined for square matrices
    only, where as SVD is defined for all matrices)
  • The m-by-n matrix X is decomposed as
  • X USVT
  • The diagonal entries of S are known as singular
    values.
  • The Columns of U forms a basis of column space
    of X and rows of V form the row space for X
  • Application
  • Document Analysis

7
Non-Negative Matrix Factorization
  • Non- Negative Matrix Factorization attempts a
    factorization in which the components have
    non-negative entries.
  • The NMF of X is given by
  • X WH where the factors W and H contains
    non-negative entries only
  • H is the mixing matrix
  • Columns of W are the components

8
Special case Characters of Indian Languages
  • The characters have component structure.(Graphic
    representation of characters is made of different
    strokes and the strokes can stand for a short
    hand of some other character).
  • Stroke based analysis of the input is essential.
  • Learn the optimal features(strokes) from examples
    of characters.

9
Previous Work
  • S H Srinivasan, K R Ramakrishnan, S Bhagavathy.
    The independent components of characters are
    'strokes'.
  • S H Srinivasan, K R Ramakrishnan, Suvrat
    Budhlakoti Character Decompositions.
  • Hindi Character Recogntion by Mandeep Singh
    Chauhan.
  • Online Character Recognition by Krishnan.

10
What we are doing
  • We implement the algorithm proposed by S H
    Srinivasan, K R Ramakrishnan, S Bhagavathy in
    The independent components of characters are
    'strokes'.
  • Compare the results obtained with the other
    methods of character recognition.

11
Formal Model
  • Independent component analysis (ICA) is a
    statistical model where the observed data is
    expressed as a linear combination of underlying
    latent variables.
  • The task is to find both the latent variables and
    the mixing process.
  • Formally, xAs
  • where x(x1,x2,,xm) is the vector of
    observed random variables
  • s(s1,s2,,sn) is the vector of statistically
    independent latent variables called independent
    components
  • A is an unknown mixing matrix

12
Algorithm
  • A fast fixed point Algorithm(Fast ICA)
    proposed by Aapo Hyverinen, Erkki Oja

13
Basic Steps in ICA
  • Pre conditions
  • The matrix A is of full rank matrix.
  • For simplicity assume that mn.
  • ICA basically involves two steps.
  • - Data Preprocessing
  • In this we center the data by removing its
    mean. We also remove the correlations between the
    components of the data.
  • - Extraction of Independent Components
  • We use Fast ICA to extract the independent
    components.

14
How to separate the Independent Components
  • The FastICA is a computationally efficient and
    robust fixed point algorithm for Independent
    component analysis.
  • The independent components s in the ICA model are
    found by searching for a matrix W.

15
Data Preprocessing
  • Let E denote the matrix of Eigen vectors of the
    covariance matrix covx and D, a diagonal matrix
    of the corresponding Eigen values.
  • where D?1, ?2,, ?m
  • The whitened data vector corresponding to the
    data vector x is given by
  • vD-1/2ETx
  • The matrix VD-1/2ET is called the whitening
    matrix.
  • The matrix vVx VAs Bs
  • The matrix B is called the whitened mixing
    matrix.

16
Extraction of the independent components
  • The image data is first converted to a vector
    using the row major representation.
  • The main idea here is the optimization of a
    contrast function, such as the kurtosis. Kurtosis
    of a random variable y is defined as
  • kurt(y) Ey4-3 (Ey2)2

17
Extraction the components (contd.)
  • Consider the optimization of the kurtosis
    projection of a zero mean whitened random
    variable v onto vector w
  • f (w) E(wTv)4-3 w4
  • with constraint h(w)w2-10
  • Necessary conditions for an optimum are given by
    the method of Lagrange multipliers
  • 4 E(WTv)3v-12 w2 w 2 ?w 0

18
Extraction the components (contd.)
  • ? - ?/2 and noting that w21,
  • ?w E(wTv)3v - 3w
  • Therefore the weight updation rule becomes
  • w(k1) E(w(k)Tv)3v 3 w(k)

19
Implementation
20
Data
  • IITK Devanagari handwritten character set
  • 44 characters each of 60 images
  • Written by 33 individuals
  • 32x32 gray scale images

21
Experiment Setting - 1
  • Number of Characters 10
  • Number of images per character 5
  • Thinning No

22
Results Experiment 1
23
Experiment Setting - 2
  • Number of Characters 10
  • Number of images per character 5
  • Thinning Yes

24
Results Experiment 2
25
Experiment Setting - 3
  • Number of Characters 44
  • Number of images per character 10
  • Number of Independent components 2010100
  • Thinning Yes

26
Results Experiment 3
27
Experiment Setting - 4
  • Number of Characters 44
  • Number of images per character 20
  • Number of Independent components 8888588
  • Thinning Yes

28
Results Experiment 4
29
(No Transcript)
30
Independent Components
31
Stroke like characters
32
Extension
  • Assumption
  • The spoken words from different individuals are
    different linear combinations of same independent
    components.

33
Data
  • Wave samples spoken words
  • one, two, three ...nine.
  • We added silence at the end to make them equal
    length.
  • We perform preprocessing in the same way as we
    have done for centering character images.

34
Experiment - 1
  • Number of numbers 9
  • Number of samples per number 5

35
Results - Experiment 1
36
Experiment - 2
  • Number of numbers 9
  • Number of samples per number 9

37
Results Experiment 2
When the test data also whitened
38
Conclusion
  • In the recognition process, we must look for the
    natural features of the data.
  • In our work we have chosen ICA as natural
    features.
  • Infact, what we have done is we made the
    recognition process simple
  • eg instead of working with 16000 samples in
    wave file we reduced to dimension of independent
    components.

39
References
  • 1 S. H. Srinivasan, K. R. Ramakrishnan, S.
    Bhagavathy The Independent Components of
    Characters are 'Strokes'. ICDAR 1999 414-417
  • 2 HYVARINEN, A., and OJA, E. A fast
    fixed-point algorithm for independent component
    analysis. Tech. Rep.A35, Helsinki University of
    Technology, Laboratory of Computer and
    Information Science, 1996.
  • 3 S H Srinivasan K R Ramakrishnan Suvrat
    Budhlakoti Character Decompositions. ICVGIP,
    2002.

40
Websites
  • Whitening http//www.cis.hut.fi/aapo/papers/IJCNN9
    9_tutorialweb/node26.html
  • ICA for dummies http//www.sccn.ucsd.edu/arno/ind
    exica.html
  • HUT CIS The FastICA package for MATLAB
    http//www.cis.hut.fi/projects/ica/fastica/

41
Thank You
Write a Comment
User Comments (0)
About PowerShow.com